threads.tex

include(`macros.m4')

\pagebreak
\pdfbookmark[0]{threads and their synchronization}{threads}

\begin{slide}
\sltitle{Contents}
\slidecontents{8}
\end{slide}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\begin{slide}
\sltitle{Threads}
\begin{itemize}
\item \emph{thread} = \emph{thread of execution}, a basic software ``thing''
that can do work on a computer
\item classic Unix model: single threaded processes
\item with introduction of threads, a process becomes just a container for
threads
\item advantages of multithreaded applications
  \begin{itemize}
  \item speed-up -- a typical objective is having threads on multiple CPUs
  running in parallel
  \item more modular programming
  \end{itemize}
\item disadvantages
  \begin{itemize}
  \item more complex code
  \item debugging may become more difficult
  \end{itemize}
\end{itemize}
\end{slide}

\begin{itemize}
\item \emsl{While one has to put resources into sharing data when working with
processes, one has to put resources into managing
inherent data sharing if working with threads.}  Note that all threads of the
same process have equal access to the process virtual address space.
\item Not all applications are fit for multithreading as some tasks are not
inherently concurrent in nature.
\item Even that debuggers typically support threads, debugging changes timing so
the problem may not reproduce when using the debugger.  That is usually not an
issue in a single threaded application.
\item There is an excellent book on programming with POSIX threads by Butenhof,
see page \pageref{REF_PROGRAMMING}. You can also use an online book
\emph{Multithreaded Programming Guide} available on
\url{http://docs.oracle.com}.
\item \hlabel{PRIVILEGE_SEPARATION} An example situation when you do not want to
use threads is if you want to change a real and effective UID of processes.  Take
OpenSSH -- every connection is served by two processes.  One, with maximum
privileges, usually runs as root, and provides services (allocating a pseudo
terminal is one of them) to a second, unprivileged process.  The idea is that
most of the OpenSSH code does not need any special privilege so if a bug is
found in code that is run under an unprivileged user, the damage is much smaller
than if such code was run with maximum privileges.  This technique is called a
\emph{privilege separation} and you could not do the same thing with threads.
\end{itemize}

\begin{slide}
\sltitle{Implementation of threads}
\setlength{\baselineskip}{0.8\baselineskip}
\begin{description}
\item[library-thread model (1:N)]~\\\vspace{-2.5ex}
    \begin{itemize}
    \item threads are implemented in a library.  Kernel has no knowledge of
    such threads.
    \item run-time library schedules threads on processes and kernel schedules
    processes on CPUs
    \item[$\oplus$] less overhead
    \item[$\ominus$] more threads of the same process cannot run in parallel
    \end{itemize}
\item [kernel-thread model (1:1)]~\\\vspace{-2.5ex}
    \begin{itemize}
    \item threads are a first class kernel citizen
    \item[$\oplus$] more threads of the same process can run in parallel on
    multiple CPUs
    \end{itemize}
\item[hybrid models (M:N)]~\\\vspace{-2.5ex}
    \begin{itemize}
    \item N library threads scheduled on M kernel threads, N $>=$ M
    \item[$\ominus$] too complex to implement, not really used today
    \end{itemize}
\end{description}
\end{slide}

\begin{itemize}
\item Original Unix systems used library models (sometimes called
\emph{lightweight threads} or \emph{green threads}). Today in general
most of the systems stick to the 1:1 model. There was some evolution in
the past, e.g. Solaris 9 was using the M:N model and switched to 1:1 in
Solaris 10.
\item The drawback of the 1:1 model is that there is always some non-trivial
overhead associated with thread creation as the library has to call into the
kernel as well in order to create the associated kernel thread.
\item Threads implemented in a library may be either preemptive or
non-pre\-emp\-tive. To achieve preemption, you can use timers and signals.
However, if the objective is more in better modular programming than real
parallelism, usually non-preemptive threads do fine.  Switching threads will be
done when a process would normally block in system calls.
\item \hlabel{SETJMP} If a system call blocks in a library implemented thread
model, the whole process will block as the kernel has no knowledge there are
more threads in the process.  So the threading library is written the way that
non-blocking calls are used, the thread context is saved after that and the
library switches to another thread via \funnm{setjmp}() and \funnm{longjmp}()
system calls.  Example: \example{pthreads/setjmp.c}.  Another way is to use a
non-standard API for user context manipulation, see \texttt{ucontext.h}, if the
system provides it.  Both Linux and macOS support it.
\item To implement a 1:N library, there are several things for consideration:
\begin{itemize}
\item how to deal with threads trying to install \texttt{SIGALRM} handler
if it is already used for firing signal to trigger the dispatcher/scheduler
periodically
\item how to implement separate stacks for each thread (especially when
\texttt{setjmp} was chosen as a building block).
\item what should be the thread states
\item how to avoid all threads to be blocked when one threads blocks e.g. on I/O
\end{itemize}
\end{itemize}

\begin{slide}
\sltitle{POSIX threads (pthreads)}

\begin{itemize}
\item first came with IEEE Std 1003.1c-1995
\item POSIX thread API uses a prefix \texttt{pthread\_}
\item these functions return 0 (= OK) or an error number (values as for
\texttt{errno})
\begin{itemize}
\item \dots{} functions do \emsl{not} set \texttt{errno}
\item so you cannot use functions \funnm{perror}() or \funnm{err}()
\end{itemize}
\item the standard also defines other functions, for example those that could
not be possible to adjust for the use with threads without changing its API (e.g.
\texttt{readdir\_r}, \texttt{strtok\_r}, etc.)
\begin{itemize}
\item \texttt{\_r} means \emph{reentrant}, i.e. the function can be called by
multiple threads without any side effects
\end{itemize}
\end{itemize}

\end{slide}

\hlabel{POSIXTHREADS}

\begin{itemize}
\item General information on POSIX is on page \pageref{POSIX}.
\item There are more threading APIs, the POSIX thread API is just one of them.
For example, there is a system call \texttt{sproc()} on IRIX, then
ifdef([[[NOSPELLCHECK]]], [[[Cthreads]]]),
Solaris threads, GNU ifdef([[[NOSPELLCHECK]]], [[[Ptr threads]]]) (= portable),
\dots
\item The POSIX thread API is available in different libraries on different
systems.  For example, on Linux you usually need \texttt{-lpthread} but on
Solaris the API is part of standard \texttt{libc}.  With \texttt{gcc}, instead
of \texttt{-lpthread}, you can use \texttt{-pthread} and the compiler will do
what is needed for the specific system (which does not have to be Linux).
\item Each POSIX thread API implementation is usually built on top of the native
threading library.  For example, on Solaris, it's the \texttt{thr\_} API
functions.
\item We will talk more on reentrant functions in connection with threads on
page \pageref{THREADSAFE}.
\item As already mentioned, given that the POSIX thread API uses \texttt{errno}
codes directly as return values, the following piece of code is not correct:

\begin{verbatim}
if (pthread_create(&thr, NULL, thrfn, NULL) != 0)
        err(1, "pthread_create");
\end{verbatim}

as \texttt{err()} will print possibly something like the following on error
(unless \texttt{errno} was set by previous code which would make it even more
confusing):

\begin{itemize}
\item ``\texttt{a.out: pthread\_create: Success}'' on Linux distributions
\item ``\texttt{a.out: pthread\_create: Error 0}'' on Solaris
\item ``\texttt{a.out: pthread\_create: Unknown error: 0}'' on FreeBSD
\item or something else based on the system and the concrete message it uses for
\texttt{errno} equal to 0, unless \texttt{errno} is already set otherwise.
\end{itemize}

The Linux approach might confuse the programmer as leaving \texttt{errno} zero
does not have to mean the function did not fail, as we just showed.  FreeBSD
makes it obvious that something is not entirely right.  Example that shows such
a situation: \example{pthreads/wrong-err-use.c}. The correct code could look
like this:

\begin{verbatim}
int e;

if ((e = pthread_create(&thr, NULL, thrfn, NULL)) != 0)
        errx(1, "pthread_create: %s", strerror(e));
\end{verbatim}

\item \hlabel{ERRNO_IN_THREADS} Other functions that use \texttt{errno} work the
same with POSIX threads as each thread has its own \texttt{errno}.  In that
case, it is redefined using a function (which can either return the value or an
address which is dereferenced).  Check \texttt{/usr/include/errno.h} on Linux if
interested.
\end{itemize}

%%%%%

\begin{slide}
\sltitle{Example: thread creation}
{\catcode95=12\catcode38=12
\begin{center}
\input{img/tex/threads.pstex_t}
\end{center}}
\end{slide}

\begin{itemize}
\item This is a trivial example.  The process (main thread) creates two more
threads and waits for them to finish.  This process thus has 3 threads in
total.
\end{itemize}

%%%%%

ifdef([[[NOSPELLCHECK]]], [[[
\pdfbookmark[1]{pthread\_create}{thrcreate}
]]])

\begin{slide}
\hlabel{PTHREAD_T}
\sltitle{Thread creation}
ifdef([[[NOSPELLCHECK]]], [[[
\funml{int \funnm{pthread\_create}(\=pthread\_t *\emph{thread},
\\\>const pthread\_attr\_t *\emph{attr},
\\\>void *(*\emph{start\_fn})(void*), void *\emph{arg});}
]]])
\begin{itemize}
\item creates a new thread, puts its ID to \emph{thread}
\item with attributes from \texttt{attr}, e.g. its stack size,
\texttt{NULL} means default attributes
\item function \emph{\texttt{start\_fn}}() will be started in the thread using
argument \emph{\texttt{arg}}. After the function returns, the thread ceases to
exist.
\item with the \texttt{pthread\_attr\_t} objects can be manipulated using
\funnm{pthread\_attr\_init}(), \funnm{pthread\_attr\_destroy}(),
\funnm{pthread\_attr\_setstackaddr}(), etc\dots{}
\end{itemize}
\end{slide}

\begin{itemize}
\item Once a thread is created, it is up to the scheduler when the thread
will really start running. The thread can voluntarily give up the CPU by using
\texttt{sched\_yield} (this is POSIX function, unlike \texttt{pthread\_yield}).
\item If the main thread does not wait for the thread to finish, the whole
program will terminate even though there are still some threads running,
see \example{pthreads/pthread\_create.c}
\item Be careful not to use this:
\begin{alltt}
for (i = 0; i < N; i++)
    pthread\_create(&tid, attr, start\_routine, &i);
\end{alltt}

It looks like we pass each thread its index.  However, before the started thread
gets scheduled to run, a next iteration might happen, modifying \texttt{i}.
\item \hlabel{WRONG_USE_OF_ARG} Examples: \example{pthreads/wrong-use-of-arg.c},
\example{pthreads/correct-use-of-arg.c}.
\item If you need to pass only one value, you could use the following
(\textbf{note that it is implementation specific in the C standard so it is not
portable});

\begin{alltt}
assert(sizeof (void *) >= sizeof (int));
for (i = 0; i < N; i++)
    pthread\_create(&tid, attr, start\_routine, (void *)(intptr\_t)i);
\end{alltt}

\dots and in function \texttt{void *start\_routine(void *arg)} cast the pointer
back to integer.

\begin{alltt}
printf("thread \%d started\bs{}n", (int)arg);
\end{alltt}

\hlabel{INT_AS_ARG} Example: \example{pthreads/int-as-arg.c}
\item If we need to pass more bytes than the size of the pointer, you must
pass a pointer to memory where the passed data is stored, or use global
variables.  Accessing global variables must be synchronized, of course.  More on
that on page \pageref{THREADSYNCHRONIZATION}.
\item \hlabel{PTHREAD_CREATE_CYCLE} \texttt{pthread\_t} is a transparent type and
its implementation is not of your concern.  Usually it is an integer though used
to map to the native threads provided by the system.  If you create several
threads, you need to pass a different address for a \texttt{pthread\_t} variable
otherwise you will not be able to further manipulate with the threads from the
main thread, e.g. waiting for them to finish.
\end{itemize}

%%%%%

ifdef([[[NOSPELLCHECK]]], [[[
\pdfbookmark[1]{pthread\_self, pthread\_key\_create}{pthreadkey}
\hlabel{THREAD_ATTRS}
]]])

\begin{slide}
\sltitle{Thread private attributes}
\begin{itemize}
\item instruction pointer
\item stack (automatic variables)
\item thread ID, available through
ifdef([[[NOSPELLCHECK]]], [[[
\texttt{pthread\_t \funnm{pthread\_self}(void);}
]]])
\item scheduling priority and policy
\item value of \texttt{errno}
\item thread specific data -- a pair of
ifdef([[[NOSPELLCHECK]]], [[[
\texttt{(pthread\_key\_t \emph{key}, void *\emph{ptr})}
]]])
\item signal mask
\end{itemize}
\end{slide}

\begin{itemize}
\item A key created by \funnm{pthread\_key\_create}() is visible from all
threads. However, in every thread the key may be associated with a different
value via \funnm{pthread\_setspecific}().
\item Each thread has a fixed size stack which \emsl{does not automatically
increase.}  It is usually anywhere from 64 kilobytes to a few megabytes.  If you
cross that limit, the program will quite probably crash.  If you want a stack of
a greater size than what is the system default, you have to use
set the stack size via an attribute when creating a thread.
The attribute is set using the \funnm{pthread\_attr\_setstacksize}() function.
\\
Example: \example{pthreads/pthread-stack-overflow.c}
\item You can read more about thread specific data on page
\pageref{THREAD_SPECIFIC_DATA}.
\item More on per thread signal mask is on page \pageref{PTHREADSIGMASK}.
\end{itemize}

%%%%%

ifdef([[[NOSPELLCHECK]]], [[[
\pdfbookmark[1]{pthread\_exit, pthread\_join, pthread\_detach}{pthreadexit}
]]])

\begin{slide}
\sltitle{Terminating the calling thread}
ifdef([[[NOSPELLCHECK]]], [[[
\texttt{void \funnm{pthread\_exit}(void *\emph{val\_ptr});}
]]])
\begin{itemize}
\item terminates the calling thread, it is similar to \funnm{exit}() for
processes
\end{itemize}
ifdef([[[NOSPELLCHECK]]], [[[
\texttt{int \funnm{pthread\_join}(pthread\_t \emph{thr},
void **\emph{val\_ptr});}
]]])
\begin{itemize}
\item waits for thread \emph{\texttt{thr}} to finish, the value passed to
\funnm{pthread\_exit}() or the return value is stored in the location referenced
by \emph{\texttt{val\_ptr}}
\end{itemize}
ifdef([[[NOSPELLCHECK]]], [[[
\texttt{int \funnm{pthread\_detach}(pthread\_t \emph{thr});}
]]])
\begin{itemize}
\item indicate that storage for the \emph{\texttt{thr}} can be reclaimed when
the thread terminates. \funnm{pthread\_join}() can no longer be used.
\end{itemize}
\end{slide}

\begin{itemize}
\item If \funnm{pthread\_exit}() is not used, it is implicitly called when the
thread terminates with the value from the function \texttt{return}.
\item The contents of the exiting thread's stack are undefined so you should
not use pointers to the function local variables from memory pointed to by
\emph{\texttt{val\_ptr}}.
\item If you do not intend to call \funnm{pthread\_join}(), you need to call
\funnm{pthread\_det\-ach}() or use the attributes (see below).  If you do not, a
memory needed to carry information for a subsequent \funnm{pthread\_join}() will
not be freed.  It is a similar situation as with accumulated zombies.  You can
just call it like this in the thread function:
\begin{alltt}
pthread\_detach(pthread\_self());
\end{alltt}
\item You can also set the thread attributes when creating the thread, using
\funnm{p\-thr\-ead\_attr\_setdetachstate}() with
\texttt{PTHREAD\_CREATE\_DETACHED} on the attribute variable and then use that
in \funnm{pthread\_create}().  Example on setting the attributes:
\example{pthreads/set-detachstate.c}
\item You can use \texttt{NULL} as \emph{\texttt{val\_ptr}} in
\funnm{pthread\_join}(), telling the system you are not interested in the return
value.
\item Any thread can wait for another thread, not just the one that created it.
\item We recommend to always check the return value of \funnm{pthread\_join}()
to make sure you wait for the right thread.  If you use an incorrect thread ID,
the function returns immediately with an error.
\item In contrast to waiting for processes to finish, \emsl{one cannot wait for
any thread to finish}.  The rationale is that since there is not parent--child
relation, it was not deemed necessary.  However, some system provide that
functionality, e.g.  on Solaris you can use \texttt{0} for a thread ID in
\funnm{thr\_join}().  If you needed this functionality with POSIX thread API, it
is easy to set threads as \emph{detached} and use a condition variable together
with a global variable.  More on that on page \pageref{CONDITION_VARIABLES}.
\item \hlabel{PTHREAD_JOIN} Examples: \example{pthreads/pthread-join.c},
\example{pthreads/pthread-detach-join.c}
\end{itemize}


%%%%%

ifdef([[[NOSPELLCHECK]]], [[[
\pdfbookmark[1]{pthread\_once}{pthreadonce}
]]])

\begin{slide}
\sltitle{Initialization}
ifdef([[[NOSPELLCHECK]]], [[[
\funml{int \funnm{pthread\_once}(\=pthread\_once\_t *\emph{once\_control},
\\\>void (*\emph{init\_routine})(void));}
]]])
\begin{itemize}
\item in \emph{once\_control} you pass a pointer to statically initialized
variable
ifdef([[[NOSPELLCHECK]]], [[[
\texttt{pthread\_once\_t \emph{once\_control} = PTHREAD\_ONCE\_INIT;}
]]])
\item first thread that calls \funnm{pthread\_once}() calls
\emph{init\_routine()}.  Other threads will not call the function, and if it has
not finished yet, will block waiting for it to finish.
\item you can use it for dynamic initialization of global data in libraries
where multiple threads may be using the library API at the same time
\end{itemize}
\end{slide}

\begin{itemize}
\item This function is primarily meant to be used in libraries and rarely
needed in a program. In the latter case, the initialization function can be
called before the first thread is created.
\item The behavior is undefined if \emph{once\_control} is a local variable or
does not have an expected value.
\item This is handy for lazy initialization, i.e. only after one of the threads
call into APIs of the library (as opposed to when the library is being loaded),
the library becomes initialized and the semantics of \texttt{pthread\_once}
will make sure this will happen only once.
\end{itemize}

%%%%%

ifdef([[[NOSPELLCHECK]]], [[[
\pdfbookmark[1]{pthread\_cancel, pthread\_setcancelstate, pthread\_setcanceltype}{pthreadcancel}
]]])

\begin{slide}
\sltitle{Cancel execution of a thread}
\setlength{\baselineskip}{0.9\baselineskip}
ifdef([[[NOSPELLCHECK]]], [[[
\texttt{int \funnm{pthread\_cancel}(pthread\_t \emph{thread});}
]]])
\begin{itemize}
\item cancel \emph{thread}.  Depends on:
\end{itemize}
\texttt{int \funnm{pthread\_setcancelstate}(int \emph{state},
int *\emph{old});}
\begin{itemize}
\item sets new state and returns the old value:
    \begin{itemize}
    \item \texttt{PTHREAD\_CANCEL\_ENABLE} \dots{} cancellation allowed
    \item \texttt{PTHREAD\_CANCEL\_DISABLE} \dots{} cancellation requests
    against the target thread are held pending
    \end{itemize}
\end{itemize}
\texttt{int \funnm{pthread\_setcanceltype}(int \emph{type}, int *\emph{old});}
\begin{itemize}
\item \texttt{PTHREAD\_CANCEL\_ASYNCHRONOUS} \dots{} immediate cancellation
\item \texttt{PTHREAD\_CANCEL\_DEFERRED} \dots{} cancellation requests are held
pending until a cancellation point is reached.
\end{itemize}
\end{slide}

\begin{itemize}
\item Cancellation points will occur when a thread is executing functions
specified in the standard, like \funnm{open}(), \funnm{read}(),
\funnm{accept}(), etc.  The full list is usually in the
\texttt{pthread\_setcancelstate} man page.
\item The \funnm{pthread\_testcancel}() function creates a cancellation point in
the calling thread.  The \funnm{pthread\_testcancel}() function has no effect if
the ability to cancel is disabled.
\item Be very careful with the use of \texttt{PTHREAD\_CANCEL\_ASYNCHRONOUS} as
it may lead to data inconsistency as the cancellation may happen any time, even
in your critical sections.
\item Cleanup functions are called on cancellation, see page
\pageref{PTHREAD_CLEANUP}.  For example, if cancelling a thread holding a mutex,
you could use the cleanup function to unlock it.
\item Functions \funnm{pthread\_setcancelstate}() and
\funnm{pthread\_setcanceltype}() provide similar functionality to threads as is
manipulating a signal mask to processes.
\item \hlabel{PTHREAD_CANCEL} Example: \example{pthreads/pthread-setcanceltype.c}
\end{itemize}

%%%%%

ifdef([[[NOSPELLCHECK]]], [[[
\pdfbookmark[1]{pthread\_key\_create, pthread\_key\_delete,
pthread\_setspecific, pthread\_getspecific}{pthreadglobals}
]]])

\begin{slide}
\sltitle{Global variables per thread}
ifdef([[[NOSPELLCHECK]]], [[[
\funml{int \funnm{pthread\_key\_create}(\=pthread\_key\_t *\emph{key},
\\\>void (*\emph{destructor})(void *));}
]]])
\begin{itemize}
\item creates a key that can be associated with a value of
\texttt{(void *)} type. The \emph{destructor()} function is called for all keys
whose value is not \texttt{NULL} on thread termination.
\end{itemize}
ifdef([[[NOSPELLCHECK]]], [[[
\texttt{int \funnm{pthread\_key\_delete}(pthread\_key\_t \emph{key});}
]]])
\begin{itemize}
\item deletes the key, does not change the associated data
\end{itemize}
ifdef([[[NOSPELLCHECK]]], [[[
\funml{int \funnm{pthread\_setspecific}(\=pthread\_key\_t \emph{key},
\\\>const void *\emph{value});}
]]])
\begin{itemize}
\item binds pointer \emph{value} to previously created \emph{key}
\end{itemize}
ifdef([[[NOSPELLCHECK]]], [[[
\texttt{void *\funnm{pthread\_getspecific}(pthread\_key\_t \emph{key});}
]]])
\begin{itemize}
\item returns the value of \emph{key}
\end{itemize}
\end{slide}

\begin{itemize}
\item \hlabel{THREAD_SPECIFIC_DATA} Global variables and dynamically
allocated data are common to all threads.  Thread specific data provides a way to
create a global variable per thread.  Note the difference between that and a
local variable in the thread function -- as you know, in C, the local variable
is not visible in other functions called from the thread function.  Thread
specific data is a very useful feature.  Imagine you have existing
multithreading code using a global storage place which suffers from heavy
contention.  You can easily create a thread specific data to create a storage
place per thread with minimal changes to the original code.
\item When you create a key, the \texttt{NULL} value is associated with it.
\item If you do not need the destructor function, use \texttt{NULL}.
\item Destructors are called at thread exit in unspecified order on all keys
with a value different from \texttt{NULL}.  Its value is set to \texttt{NULL}
and the original value is used as a parameter to the destructor.  If, after
all the destructors have been called for all non-\texttt{NULL} values with
associated destructors, there are still some non-\texttt{NULL} values with
associated destructors, then the process is repeated.  If, after at least
\texttt{PTHREAD\_DESTRUCTOR\_ITERATIONS} iterations of destructor calls for
outstanding non-NULL values, there are still some non-\texttt{NULL} values with
associated destructors, the implementation may (it usually does otherwise you
could end up with an infinite loop) stop calling destructors.
\end{itemize}

%%%%%

ifdef([[[NOSPELLCHECK]]], [[[
\pdfbookmark[1]{pthread\_cleanup\_push, pthread\_cleanup\_pop}{pthreadcleanup}
]]])

\begin{slide}
\sltitle{Cleanup functions}
\begin{itemize}
\item each thread has a stack of cleanup routines called when functions
\funnm{pthread\_exit}() or \funnm{pthread\_cancel}() are called (but not when
\texttt{return} is used).  Routines are run from the top of the stack down.
\item after cleanup functions are called, thread specific data destructors are
called in unspecified order
\end{itemize}
ifdef([[[NOSPELLCHECK]]], [[[
\funml{void \funnm{pthread\_cleanup\_push}(\=void (*\emph{routine})(void *),
\\\>void *\emph{arg});}
]]])
\begin{itemize}
\item add \emph{routine} to the top of the stack
\end{itemize}
\texttt{void \funnm{pthread\_cleanup\_pop}(int \emph{execute});}
\begin{itemize}
\item remove a routine from the top of the stack.  Will call the routine if
\emph{execute} is non-zero
\end{itemize}
\end{slide}

\hlabel{PTHREAD_CLEANUP}

\begin{itemize}
\item The cleanup function is called as \texttt{routine(arg)}.
\item The \funnm{pthread\_cleanup\_push} and \funnm{pthread\_cleanup\_pop}
functions provide sort of a bracketing around code block that might need a
cleanup. In fact, more often than not these functions are implemented as macros
that open and close a code block. This enforces this usage pattern
(and causes trouble with \texttt{goto} statements).
\item Run a C preprocessor on \example{pthreads/pthread-cleanup.c} and see how
this is done.
\end{itemize}

%%%%%

ifdef([[[NOSPELLCHECK]]], [[[
\pdfbookmark[1]{pthread\_atfork}{pthreadatfork}
]]])

\begin{slide}
\sltitle{\funnm{fork}() and POSIX threads}
\prgchars
\begin{itemize}
\item it is necessary to define semantics of \funnm{fork}() in multithreaded
applications.  The standard says:

\begin{itemize}
\item the calling process contains the exact copy of the calling thread,
including all the mutex states
\item other threads are not propagated to the new process
\item if such other threads contain sole references to allocated memory, the
memory will remain allocated but lost (leaked)
\item mutexes locked in other threads will remain locked for ever
\end{itemize}
\item creating a process from a multithreaded application makes sense for
subsequent \funnm{exec}(), for example,  including \funnm{popen}() or
\funnm{system}()
\end{itemize}
\end{slide}

\begin{itemize}
\item No cleanup routines or thread specific data destructors are called for
threads not propagated to the new process.
\item \hlabel{FORKALL} Note that the way how \funnm{fork}() works also depends on
the system used.  For example, in Solaris before version 10 (i.e. before 2005),
\funnm{fork}() in the \texttt{libthread} library (different from
\texttt{libpthread}) was the same as \funnm{forkall}().
\item Examples: \example{pthreads/fork.c},
\example{pthreads/fork-not-in-main.c}, and also \example{pthreads/forkall.c}
\item \hlabel{ATFORK} You can use \funnm{pthread\_atfork} to set handlers that
are executed before \funnm{fork} is called in the parent process, and then after
\funnm{fork} is called both in the parent and its child.  The handlers are
executed in the context of the thread that calls the \funnm{fork}.  Such
handlers are very useful when \funnm{fork} is used not only as a wrapper
around \funnm{exec}.  After \funnm{fork}, all variables in the child are in
the state as in the parent, so if a thread not present in the child held a mutex
in the parent (see page \pageref{MUTEXES}), the mutex stays locked in the child,
and trying to lock it in the child will lead to a deadlock.  However, if the
parent locks all the mutexes in the \emph{\texttt{pre-fork}} handler and then
unlocks them in the \emph{\texttt{post-fork}} handler (both for the parent and
the child), you will avoid such deadlocks.  That is because when locking mutexes
in the \emph{\texttt{pre-fork}} handler, other threads are still running so the
mutexes held by them should be released eventually (usually each thread exits a
critical section in a short time in well written code).
\item This scheme will only work if the \emph{\texttt{pre-fork}} handler
maintains the same locking protocol/ordering as is used in the
application/library. Sometimes that is just not possible due to multiple complex
orderings in place. Very often \funnm{fork} is called just as a means for
later \funnm{exec}. In such case the programmer might be better off using
\funnm{posix\_spawn}.
\item Example: \example{pthreads/atfork.c}
\item For more on this topic, see [Butenhof].
\item See page \pageref{MUTEXES} on why mutexes locked in other threads on
\funnm{fork}() stay locked forever.
\end{itemize}

%%%%%

ifdef([[[NOSPELLCHECK]]], [[[
\pdfbookmark[1]{pthread\_sigmask}{pthreadsigmask}
]]])

\begin{slide}
\sltitle{Signals and threads}
\prgchars
\begin{itemize}
\item signals can be generated for a process (the \texttt{kill} syscall) or
for a thread (error conditions, the \texttt{pthread\_kill} call).
\item signal handling is the same for all threads in the process,
the mask of blocked signals is specific for each thread, can be set with
\end{itemize}
ifdef([[[NOSPELLCHECK]]], [[[
\funml{int \funnm{pthread\_sigmask}(\=int \emph{how},
const sigset\_t *\emph{set},\\\> sigset\_t *\emph{oset});}
]]])
\begin{itemize}
\item a signal for a process is handled by one of its threads, one of those
that do not have such signal blocked.
\item one thread can be dedicated for signal handling using the
\texttt{sigwait} call. The other threads have the signals blocked.
\end{itemize}
\end{slide}

\hlabel{PTHREADSIGMASK}

\begin{itemize}
\item If the action for a signal is set to process exit, the whole process
will exit, not just one thread.
\item New thread will inherit signal mask from the creator thread.
\item Similarly to the use of \texttt{sigwait} with processes (page
\pageref{SIGWAIT}) -- just block given signals in all threads, including the
thread processing the signals using \texttt{sigwait}.
\emsl{This way of signal handling in threaded environment is usually the only
recommended.}  And it is easy to implement as well.  From a previous note, it is
sufficient to mask the signals only once, in the main thread (before creating
any new threads), because the mask will be inherited with each
\texttt{pthread\_create} call.
\item Do not use \texttt{sigprocmask} (page \pageref{SIGPROCMASK}) in threaded
environment, because the behavior of this call is not specified by the standard
in such environment. It may work, or not.
\item \hlabel{THREADS_SIGWAIT} Example: \example{pthreads/sigwait.c}.
\item \emsl{Note} that this way of signal handling should not be used for
synchronous signals such as \texttt{SIGSEGV}, \texttt{SIGILL}, etc. These
signals are generated directly for a thread, so if blocked the dedicated signal
handling thread may not ``see'' them (if it did not cause them by itself).
Also, the standard does not specify if blocking these signals should actually
work as mentioned on page \pageref{SPECIALSIGNALS}. Some systems
normally deliver these signals, making the process exit. The standard says:

\begin{quote}
\emph{If any of the \texttt{SIGFPE, SIGILL, SIGSEGV}, or \texttt{SIGBUS} signals
are generated while they are blocked, the result is undefined, unless the signal
was generated by the \texttt{kill} function, the \texttt{sigqueue} function, or
the \texttt{raise} function.}
\end{quote}

Example: \example{pthreads/sigwait-with-sync-signals.c}. This example shows that
Solaris 10 and 11, FreeBSD 7.2 and a Linux distribution, that reports
itself as ``Gentoo Base System release 1.12.13'', is the \texttt{SIGSEGV} signal
delivered and the process killed even though it is masked.
There used to be a system that did not deliver the signal when masked
-- FreeBSD 6.0.  It should be possible to handle synchronous signals
if you terminate the process in the handler itself (e.g. after printing a user
friendly error message), see page \pageref{SPECIALSIGNALS}, which also contains
an example.

\end{itemize}


%%%%%

\begin{slide}
\sltitle{Thread synchronization in general}

\begin{itemize}
\item most of the programs employing threads needs to share data between them
\item or needs to execute given actions in certain order
\item \dots{}all of this needs to \emsl{synchronize} running threads activity
\item for processes it is necessary to make some effort to actually share data,
for threads on the other hand it is necessary to maintain natural data sharing.
\item will describe:
\begin{itemize}
\item mutexes
\item condition variables
\item read-write locks
\end{itemize}
\end{itemize}
\end{slide}

\hlabel{THREADSYNCHRONIZATION}

\begin{itemize}
\item Process synchronization is described on pages
\pageref{SYNCHRONIZATION} to \pageref{SYNCHRONIZATIONEND}.
\item By using mutexes and condition variables it is possible to construct any
other synchronization model.
\item The exact behavior of synchronization primitives is largely determined by
the scheduler.  It decides which of the threads waiting for releasing a lock
will be woken up after the lock is actually released. This leads to classical
problems such as a \emph{thundering horde} (lots of threads waiting for unlock)
or a \emph{priority inversion} (thread holding a lock has lower priority than
the thread waiting for the lock).
\end{itemize}

%%%%%

ifdef([[[NOSPELLCHECK]]], [[[
\pdfbookmark[1]{pthread\_mutex\_init, pthread\_mutex\_destroy}{pthreadinit}
]]])

\begin{slide}
\sltitle{Thread synchronization: mutexes (1)}

\begin{itemize}
\item the simplest way how to ensure synchronized access to shared data between
threads
\item initialization of statically defined mutex:
\end{itemize}
ifdef([[[NOSPELLCHECK]]], [[[
\texttt{pthread\_mutex\_t mutex = PTHREAD\_MUTEX\_INITIALIZER;}
]]])
\begin{itemize}
\item initialization of dynamically allocated mutex \texttt{mx} with attributes
\texttt{attr} (these are set using \texttt{pthread\_mutexattr\_...};
if \texttt{attr} is \texttt{NULL}, default attributes will be used)
\end{itemize}
ifdef([[[NOSPELLCHECK]]], [[[
\funml{int \funnm{pthread\_mutex\_init}(\=pthread\_mutex\_t *\emph{mx},
\\\>const pthread\_mutexattr\_t *\emph{attr});}
]]])
\begin{itemize}
\item after done using the mutex it is possible to destroy it:
\end{itemize}
ifdef([[[NOSPELLCHECK]]], [[[
\texttt{int \funnm{pthread\_mutex\_destroy}(pthread\_mutex\_t *\emph{mx});}
]]])
\end{slide}

\hlabel{MUTEXES}

\begin{itemize}
\item Mutex = \emph{mutual exclusion}
\item Special form of Dijkstra semaphores -- the difference between mutexes and
binary semaphores is that \emsl{mutex has an owner and locked mutex must be
unlocked only by the thread that acquired it.} This is not the case with
semaphores. In order to check whether given mutex was locked by different thread
when acquiring it, it is necessary to test the return value of
\texttt{p\-thread\_mutex\_lock}, and also have the lock checking set, see below.
\item Mutexes are meant to be held for short time only. They are used for
critical section (see the definition on page \pageref{CRITICALSECTION})
implementation, similarly to lock-files or semaphores (if used like locks).
\item Lock checking is governed by a mutex type.  By default the mutex type is
set to \texttt{PTHREAD\_MUTEX\_DEF\-AULT}.  This type by itself does not define
the result of (a) locking a locked mutex, (b) unlocking a mutex locked by a
different thread, or (c) unlocking an unlocked mutex.  Unix/Linux systems will
map that macro to either \texttt{PTHREAD\_MUTEX\_NORMAL} or
\texttt{PTHREAD\_\-MUT\-EX\_ERRORCHECK} (ignoring the recursive type, see
below).  Thus, depending on a specific system, locking an already locked mutex
will result in a deadlock (\texttt{NORMAL}) or not (\texttt{ERRORCHECK}).  In
the latter case, a return value will contain information about the error and if
not tested, the program will wrongly assume the mutex is locked.  For the
\texttt{NORMAL} mutex type the result of (b) and (c) is not
defined, for \texttt{ERRORCHECK} an error will be returned.
In general you should avoid any undefined behavior unless specifically
documented by the system at hand.  More information can be found in the POSIX
standard or the \texttt{pth\-read\_mutex\-attr\_set\-ty\-pe} man page.  Checking
return values of mutex functions can make the code slightly less readable
however it can be wrapped in a macro. Alternatively, the checks can be used
during development only.  Solaris and Linux use \texttt{NORMAL} type by default,
FreeBSD uses \texttt{ERRORCHECK}.  \hlabel{NOTMYLOCK}
Example: \example{mutexes/not-my-lock.c}.
\item Another type is \texttt{PTHREAD\_MUTEX\_RECURSIVE} that holds a count of
lock actions done by given thread. The remaining threads will be granted access
only if the count reaches 0. This mutex cannot be shared between processes.
\item What are recursive mutexes good for? Let's assume there are two
libraries, \texttt{A} and \texttt{B}. There is a library
function \texttt{A:foo()} acquires a mutex and calls \texttt{B:bar()},
and in turn calls \texttt{A:bar()} which tries to acquire the same
mutex.  Without recursive locks a deadlock will ensue.  With recursive mutexes
that's fine if these two calls are done by the same thread (another thread will
get blocked).  That is, assuming \texttt{A:foo()} and \texttt{A:bar()} are aware
that the same thread can be already in the critical section.
\item \hlabel{MUTEXTAB} The behavior according to mutex types:\\
\\
\raisetab{
\begin{tabular}[t]{r|c|c|c|}
% \cline{2-4}
&\texttt{NORMAL}&\texttt{ERRORCHECK}&\texttt{RECURSIVE}\\
% \cline{2-4}
detects deadlock&N&Y&N/A\\
multi locking&deadlock&error&success\\
unlock by different thread&undefined&error&error\\
unlock unlocked&undefined&error&error\\
can be shared between processes&Y&Y&N
% \cline{2-4}
\end{tabular}}
\item Static mutex initialization using before mentioned macro will set default
attributes. It is possible to use initializer function also for statically
allocated mutex. If a mutex is dynamically allocated, it is always necessary to
use \texttt{pthread\_mutex\_init}, even if the default attributes are desired or
not.
\item Dynamic mutexes are needed e.g. when a data structure containing a mutex
protecting it is dynamically allocated.  In such a case, before calling
\texttt{free} with the data structure, it is first necessary to properly destroy
the mutex (that can also have some memory allocated).  Destroying a locked mutex
is not defined by the standard.
\item Copying mutexes is also not defined by the standard -- the result of such
an operation depends on the implementation.  It is possible to copy a pointer to
a mutex and work with that.
\item A mutex destroy means its deinitialization.
\end{itemize}

%%%%%

ifdef([[[NOSPELLCHECK]]], [[[
\pdfbookmark[1]{pthread\_mutex\_lock, pthread\_mutex\_unlock,%
pthread\_mutex\_trylock}{pthreadmutexfncs}
]]])

\begin{slide}
\sltitle{Mutexes (2)}

\begin{itemize}
\item to lock and unlock mutex:
\end{itemize}
ifdef([[[NOSPELLCHECK]]], [[[
\texttt{int \funnm{pthread\_mutex\_lock}(pthread\_mutex\_t *\emph{mx});}
]]])
and
ifdef([[[NOSPELLCHECK]]], [[[
\texttt{int \funnm{pthread\_mutex\_unlock}(pthread\_mutex\_t *\emph{mx});}
]]])
\begin{itemize}
\item If a mutex is already locked, the attempt to acquire it will result
in the thread being locked (depending on mutex type).
It is possible to use:
\end{itemize}
ifdef([[[NOSPELLCHECK]]], [[[
\texttt{int \funnm{pthread\_mutex\_trylock}(pthread\_mutex\_t *\emph{mx});}
]]])
\begin{itemize}
\item[\dots] that will attempt to acquire the mutex and if that fails it will
return error
\end{itemize}
\end{slide}

\hlabel{MUTEXES2}

\begin{itemize}
\item If you need to unlock a mutex locked by a different thread, use binary
semaphores instead.
\item When creating a program where efficiency is paramount, it is necessary to
think about how many mutexes will be needed and how exactly they will be used.
Even a library that was not written with threads in mind can be converted to be
thread-safe (see page \pageref{THREADSAFE}) by acquiring a per-library lock on
any library function entry and releasing the lock before the function exits.
Such a lock may be called a ``giant'' mutex, and it may lead to lock contention
for every consumer of such a library as at any given moment only one thread may
execute the library code.  On the other hand, if using a large number of mutexes
to synchronize access to many small sections, significant amount of time might
be spent in the overhead of calling functions implementing the locking.  It is
therefore desired to search for a compromise.  (Or use an algorithm that does
not require locks at all - such algorithms/techniques are called \emph{lock
free} or \emph{lockless}).
\item \hlabel{MUTEX_RACE} Examples: \example{mutexes/race.c},
\example{mutexes/race-fixed.c}
\item Mutexes can be shared between processes so that their threads will
synchronize on them. This is done by using shared memory that will be set as an
attribute of such mutexes. See the
\texttt{pthread\_mutexattr\_setpshared} man page.
\end{itemize}


%%%%%

\begin{slide}
\sltitle{Condition variables (1)}
\begin{itemize}
\item mutexes provide synchronization for shared data
\item condition variables pass information about the shared data --
for example that the value has changed
\item \dots{}and allows to put threads to sleep and wake them up
\item therefore \emsl{each condition variable is always associated
with exactly one mutex}
\item one mutex can be associated with multiple condition variables
\item using mutexes and condition variables it is possible to construct
other synchronization primitives -- semaphores, barriers, \dots
\end{itemize}
\end{slide}

\hlabel{CONDITION_VARIABLES}

\begin{itemize}
\item In other words -- condition variables are handy in a situation when a
thread needs to test the state of \emsl{shared} data (e.g. number of elements in
a queue) and voluntarily put itself to sleep if the state is not as desired.
The sleeping thread may be woken up by another thread after the latter
changed the state of the data in a way that the situation which the first thread
was waiting on actually happened (e.g. by inserting an item into a queue).  The
second thread wakes the first one by calling a designated function.  If no
thread is sleeping at the moment, that function would have no effect -- nothing
will be saved anywhere, it is as if it never happened.
\item A condition variable, which is an opaque type for the programmer, is not
associated with a concrete condition like ``\emph{\texttt{n} is greater than
7}''.  A condition variable may in fact be compared to a flag of a certain
color; if it is lifted up, it means that the threads waiting for the flag to be
raised are informed (= woken up) and may use this information to its own
judgment.  Some threads may wait for \texttt{n} to be bigger than 7, some other
may be waiting solely for \texttt{n} to change, and another then for \texttt{n}
to become 99.  It is only up to the programmer whether a separate condition
variable will be used for all states of \texttt{n} (i.e. we would use multiple
flags of different colors) or whether a single condition variable will be used
(the same flag color for all situations).  For the latter, the threads waiting
on \texttt{n > 7} and \texttt{n == 99} must always test \texttt{n} as they know
that they are woken up whenever the variable changed.  If the variable is not
equal to 7, the thread must voluntarily put itself to sleep again.  As it is
explained further, the \emsl{test is necessary to perform after an every
wake-up} even if a dedicated condition variable is used for every possible
state -- it may happen that the system can wake up a sleeping thread (because
of various implementation reasons) without any other thread causing this; it is
called a \emph{spurious wake-up}.
\end{itemize}

%%%%%

ifdef([[[NOSPELLCHECK]]], [[[
\pdfbookmark[1]{pthread\_cond\_init, pthread\_cond\_destroy,%
pthread\_cond\_wait}{pthreadcondvarfncs}
]]])

\begin{slide}
\sltitle{Condition variables (2)}
\prgchars
ifdef([[[NOSPELLCHECK]]], [[[
\funml{int \funnm{pthread\_cond\_init}(\=pthread\_cond\_t *\emph{cond},
\\\>const pthread\_condattr\_t *\emph{attr});}
]]])
\begin{itemize}
\item initializes condition variable \texttt{cond} with attributes \texttt{attr}
(they are set with the \texttt{pthread\_condattr\_...()} functions),
\texttt{NULL} = default.
\end{itemize}
ifdef([[[NOSPELLCHECK]]], [[[
\texttt{int \funnm{pthread\_cond\_destroy}(pthread\_cond\_t *\emph{cond});}
]]])
\begin{itemize}
\item destroys condition variable.
\end{itemize}
ifdef([[[NOSPELLCHECK]]], [[[
\funml{int \funnm{pthread\_cond\_wait}(\=pthread\_cond\_t *\emph{cond},
\\\>pthread\_mutex\_t *\emph{mutex});}
]]])
\begin{itemize}
\item waits on condition variable until another thread calls
ifdef([[[NOSPELLCHECK]]], [[[
\funnm{pthread\_cond\_signal()} or \funnm{pthread\_cond\_broadcast()}.
]]])
\end{itemize}
\end{slide}

\begin{itemize}
\item While the condition variables are used for putting threads to sleep and
waking them up, given that we work with shared data, there is always a mutex
involved when working with a condition variable.
\item It is necessary to test the condition after the thread locks the mutex
and before the \texttt{pthread\_cond\_wait} is called. If the thread does not
perform this operation, it could be put to sleep indefinitely because the 
message from another thread about the condition changing could go ``unnoticed''.
In other words, a thread cannot enter sleep to wait for situation which happened
in the meantime. It does not work like signals which are ``held'' by the system
if they are blocked. It is important to perform this under the protection of
the mutex, to be sure what is the state of the data when calling
\texttt{pthread\_cond\_wait}.
\item The condition variables API works thanks to the fact that when entering
critical section the mutex is locked by the thread and the
\emsl{\texttt{pthread\_cond\_wait} function will unlock the mutex before putting
the thread to sleep}. Before exiting from the function the mutex is locked
again.  It may therefore happen that the thread is woken up while waiting on a
condition variable and then put to sleep again when hitting a mutex already
locked by another thread.  There is nothing complicated about this, it is merely
a mutual exclusion of threads in a critical section.
\end{itemize}

%%%%%

ifdef([[[NOSPELLCHECK]]], [[[
\pdfbookmark[1]{pthread\_cond\_signal, pthread\_cond\_broadcast,
pthread\_cond\_timedwait}{pthreadcondunblfncs}
]]])

\begin{slide}
\sltitle{Condition variables (3)}
\prgchars
ifdef([[[NOSPELLCHECK]]], [[[
\texttt{int \funnm{pthread\_cond\_signal}(pthread\_cond\_t *\emph{cond});}
]]])
\begin{itemize}
\item wakes up one thread waiting on condition variable
\texttt{cond}.
\end{itemize}
ifdef([[[NOSPELLCHECK]]], [[[
\texttt{int \funnm{pthread\_cond\_broadcast}(pthread\_cond\_t *\emph{cond});}
]]])
\begin{itemize}
\item wakes all threads waiting on condition variable
\texttt{cond}.
\end{itemize}
ifdef([[[NOSPELLCHECK]]], [[[
\funml{int \funnm{pthread\_cond\_timedwait}(\=pthread\_cond\_t *\emph{cond}, 
\\\>pthread\_mutex\_t *\emph{mutex},\\\> const struct timespec *\emph{atime});}
]]])
\begin{itemize}
\item waits for \texttt{pthread\_cond\_signal()} or
\texttt{pthread\_cond\_broadcast()} or until the system time to reaches the
absolute time given by the \texttt{atime} value.
\end{itemize}
\end{slide}

\begin{itemize}
\item One condition variable can be used to announce multiple situations at
once -- e.g. when inserting or removing an item to/from a queue.  Because of
this, it is necessary to test the condition the thread is waiting for.  Another
consequence of this is that it is necessary to use a broadcast in such a
situation.  Let us assume that both readers and writers are waiting for a
condition ``state of the queue has changed''.  If only a single wake-up event is
made after an item insertion to the queue, a writer may be woken up but it is
however waiting for a different event -- an item removal, so it puts itself to
sleep again.  Thus, the message will remain in the queue until a reader is woken
up.
\item A thread may be woken up by another thread even in a case when a condition
variable is associated with a specific event that is no longer true after the
waiting thread is woken up.  Let's consider this situation: a thread signals a
condition change and right after that another thread locks the mutex and
performs an action which invalidates the condition, e.g. removing an item from a
queue while the event was ``there is an item in the queue''.  So, the thread
woken up finds the queue empty.  That is another reason why the condition the
thread is waiting for must be \emsl{always} tested in a cycle.
\item It is also possible that a thread is woken up and the condition is not
true due to a spurious wake-up already mentioned in a previous page.  So again,
the loop must be used.
\item The \texttt{abstime} parameter of the \texttt{pthread\_cond\_timedwait}
function is absolute time, i.e. the timeout expires when the system time reaches
the value greater or equal to \texttt{abstime}.  The absolute time is used so
that it is not necessary to recompute the time difference after wake-up events.
\end{itemize}

%%%%%

\begin{slide}
\hlabel{CONDVAR_USE}
\sltitle{Using condition variables}
\begin{alltt}
pthread\_cond\_t cond; pthread\_mutex\_t mutex;
...
\emprg{pthread\_mutex\_lock}(&mutex);
while (!condition(data))
    \emblue{pthread\_cond\_wait}(&cond, &mutex);
process\_data(data, ...);
\emprg{pthread\_mutex\_unlock}(&mutex);
...
\emprg{pthread\_mutex\_lock}(&mutex);
produce\_data(data, ...);
\emblue{pthread\_cond\_signal}(&cond);
\emprg{pthread\_mutex\_unlock}(&mutex);
\end{alltt}
\end{slide}

\begin{itemize}
\prgchars
\item The first piece of code is waiting for the condition to change.  If this
happens, the data changed and can therefore be processed.  The second piece of
code (executed in a different thread) prepares the data so it can be processed.
Once ready, it will signal the consumer.  \item The \texttt{pthread\_cond\_wait}
function will automatically unlock the mutex and put the thread to sleep. Once
the thread is woken up, the system will lock the mutex first (this will be done
by the condition variable implementation) and only after that
\texttt{pthread\_cond\_wait} will return.
\item When a thread receives a signal that something has changed, it does not
mean that after the wake up the condition will be true. Moreover,
\texttt{pthread\_cond\_wait} can return even if no thread called
\texttt{pthread\_cond\_signal} or \texttt{pthread\_cond\_broadcast}.
This is another reason for testing the condition after wake-up and possibly
going to sleep again.
\item The mutex in the example above is unlocked only after the condition
was signalled however it is not necessary. The signalling can be done
after unlocking and in such a case it can be more efficient (depending on
a given implementation) because the thread that has been woken up will not get
immediately blocked by the mutex which is still held by the thread signalling
from within the critical section.
\\
That said, SUSv3 discourages this (unlocking then signalling / broadcasting),
saying it might cause "unpredictable scheduling behavior", possibly in the
environment of threads with different scheduling priorities.
\item \hlabel{QUEUESIMULATION} Example:
\example{cond-variables/queue-simulation.c}
\end{itemize}


%%%%%

ifdef([[[NOSPELLCHECK]]], [[[
\pdfbookmark[1]{pthread\_rwlock\_init, pthread\_rwlock\_rdlock,%
pthread\_rwlock\_tryrdlock}{pthreadrwlockfncs}
]]])

\begin{slide}
\sltitle{Read-write locks (1)}
\prgchars
ifdef([[[NOSPELLCHECK]]], [[[
\funml{int \funnm{pthread\_rwlock\_init}(\=pthread\_rwlock\_t *\emph{l},
\\\>const pthread\_rwlockattr\_t *\emph{attr}); }
]]])
\begin{itemize}
\item creates a lock with attributes according to \texttt{attr}
(set via \texttt{pthread\_rwlockattr\_...()} functions, \emph{NULL} = default)
\end{itemize}
ifdef([[[NOSPELLCHECK]]], [[[
\texttt{int \funnm{pthread\_rwlock\_destroy}(pthread\_rwlock\_t *\emph{l});}
]]])
\begin{itemize}
\item destroys the lock
\end{itemize}
ifdef([[[NOSPELLCHECK]]], [[[
\texttt{int \funnm{pthread\_rwlock\_rdlock}(pthread\_rwlock\_t *\emph{l});}\\
\texttt{int \funnm{pthread\_rwlock\_tryrdlock}(pthread\_rwlock\_t *\emph{rwlock});}
]]])
\begin{itemize}
\item acquires the lock for reading (more than one thread can hold the lock
for reading); if anyone holds the lock for writing, the calling thread is put
to sleep (\texttt{rdlock()}) or returns an error (\texttt{tryrdlock()}).
\end{itemize}
\end{slide}

\hlabel{RWLOCKS}

\begin{itemize}
\item You can also use static initialization
\texttt{PTHREAD\_RWLOCK\_INITIALIZER}, similarly to other synchronization
mechanisms.
\item Not a part of pthreads from POSIX.1c, rather part of extension POSIX.1j
called ``advanced real-time extensions''.
\item More than one thread can hold the lock for reading or at most one
thread for writing (and no one for reading).
\item Read-write locks are semantically similar to locking files using
\texttt{fcntl} function.
\item See \example{pthreads/pthread-rwlock.c} for a classic use case.
\item It is common that a given implementation prefers writer threads to
reader threads.  E.g. if a lock is owned by a writer while some other thread
calls function \funnm{pthread\_rwlock\_rdlock} and there is at least one thread
waiting in \funnm{pthread\_rwlock\_wrlock}, the writer will be given precedence.
See \example{pthreads/pthread-rwlock-pref.c}.
\item There is a maximum number of readers that acquired the lock in any
\texttt{pthread} implementation (inferred from the type that holds the lock
count). If the maximum is reached \funnm{pthread\_rwlock\_rdlock} returns the
\texttt{EAGAIN} error, see \example{pthreads/pthread-rwlock-limit.c}.
\end{itemize}

%%%%%

ifdef([[[NOSPELLCHECK]]], [[[
\pdfbookmark[1]{pthread\_rwlock\_wrlock, pthread\_rwlock\_trywrlock,%
pthread\_rwlock\_unlock}{pthreadrwlockfncsw}
]]])

\begin{slide}
\sltitle{Read-write locks (2)}
\prgchars
ifdef([[[NOSPELLCHECK]]], [[[
\texttt{int \funnm{pthread\_rwlock\_wrlock}(pthread\_rwlock\_t *\emph{rwlock});}
]]])
\begin{itemize}
\item acquires the lock for writing; Blocks if anyone owns the lock for reading
or writing.
\end{itemize}
ifdef([[[NOSPELLCHECK]]], [[[
\texttt{int \funnm{pthread\_rwlock\_trywrlock}(pthread\_rwlock\_t *\emph{rwlock});}
]]])
\begin{itemize}
\item like \texttt{pthread\_rwlock\_wrlock()}, however if the lock cannot be
acquired it will return error.
\end{itemize}
ifdef([[[NOSPELLCHECK]]], [[[
\texttt{int \funnm{pthread\_rwlock\_unlock}(pthread\_rwlock\_t *\emph{rwlock});}
]]])
\begin{itemize}
\item unlocks the lock
\end{itemize}
\end{slide}

\begin{itemize}
\item \emph{Interesting property}: if a thread waiting on a read-write lock
receives a signal, the signal handler will be invoked and then it will
transparently continue waiting, i.e. there will be no \texttt{EINTR} error.
The same is true for mutexes and condition variables.
\end{itemize}

%%%%%

\pdfbookmark[1]{Atomic arithmetic operations}{atomicadd}

\begin{slide}
\sltitle{Atomic arithmetic operations}
\begin{itemize}
\item for architectures where the arithmetic operations are not atomic
\item significantly faster than other mutual exclusion mechanisms
thanks to using native CPU instructions ensuring atomicity.
\item some systems or compilers supply functions for atomic operations,
(e.g. \texttt{atomic\_add(3c)} in Solaris \texttt{libc} or
\texttt{\_\_sync*} API in GCC)
\item generally it is better to use the routines from C11 standard
via \emph{\texttt{stdatomic.h}}, e.g. addition:
\end{itemize}
\begin{verbatim}
    #include <stdatomic.h>

    atomic_int acnt;
    atomic_fetch_add(&acnt, 1);
\end{verbatim}
\end{slide}

\begin{itemize}
\item \hlabel{ATOMIC_ADD} Example \example{race/atomic-add.c} demonstrates
the race condition problem when performing addition and its possible solutions.
The program spawns two threads, each thread works with the same global variable
\emsl{x}, and in a loop, each increments the variable by numbers from a sequence
of 1 to \emsl{\texttt{arg[1]}}.  The threads run in parallel, they compete for
access to \texttt{x}, and each performs the following cycle:

\begin{verbatim}
for (i = 1; i < arg; ++i)
        x = x + i;
\end{verbatim}

After the cycle the main thread performs a check of the resulting value of
\texttt{x}.   If the result is not a double of a sum of the sequence, a race
condition must have been hit (we can ignore an integer overflow in this case).
\par The results and time consumed to complete the program are radically
different for situations when the program used a plain unprotected addition,
atomic arithmetic functions, and locking using mutexes.  The difference in
completion times between arithmetic operation functions and mutexes is
especially notable on CPUs with hardware parallelism support.
\item Similarly there is an API for subtraction, bit operations AND and OR,
value assignment, etc.
\item The atomic primitives and types in C11 is an optional feature, so should
be wrapped under \texttt{\_\_STDC\_NO\_ATOMICS\_\_} negative define.
\item A good description of the C11 atomic API can be found on
\url{http://en.cppreference.com/w/c/atomic}. Some systems have the
\texttt{stdatomic(3)} man page.
\end{itemize}


%%%%%

ifdef([[[NOSPELLCHECK]]], [[[
\pdfbookmark[1]{pthread\_barrier\_init, pthread\_barrier\_wait,%
pthread\_barrier\_destroy}{barrier}
]]])

\begin{slide}
\sltitle{Barrier}
\begin{itemize}
\item \emph{barrier} is a primitive that holds group of threads together
w.r.t. execution.
\item all threads wait on barrier until the last thread from the group
reaches it; then they can all continue together.
\item typical use case is parallel data processing
\end{itemize}
ifdef([[[NOSPELLCHECK]]], [[[
\texttt{int \funnm{pthread\_barrier\_init}(pthread\_barrier\_t *\emph{barrier},
\emph{attr}, unsigned \emph{count});}
]]])
\begin{itemize}
\item initializes barrier for \emph{count} entrances
\end{itemize}
ifdef([[[NOSPELLCHECK]]], [[[
\texttt{int \funnm{pthread\_barrier\_wait}(pthread\_barrier\_t *\emph{barrier});}
]]])
\begin{itemize}
\item blocks until the function is called \emph{count} times
\end{itemize}
ifdef([[[NOSPELLCHECK]]], [[[
\texttt{int \funnm{pthread\_barrier\_destroy}(pthread\_barrier\_t *\emph{barrier});}
]]])
\begin{itemize}
\item destroys the barrier
\end{itemize}
\end{slide}

\hlabel{BARRIER}

\begin{itemize}
\item The barrier API has been defined since SUSv3,
The barrier API is non-mandatory part of POSIX (it belongs to
\emph{Advanced real-time threads extension}) and hence SUS certified system does
not have to implement it, which is e.g. the case of macOS.
However, a barrier can be implemented using mutexes and condition variables.
\item The barrier can be used e.g. in a situation when it is necessary to
perform some initialization between individual phases of processing.
The threads need to wait for each other because the initialization can only
begin once the previous phase is over.
Example \example{pthreads/pthread-barrier.c} shows the use in such case.
\item \texttt{pthread\_barrier\_wait} returns
the \texttt{PTHREAD\_BARRIER\_SERIAL\_THREAD} value
in the last thread that reached the barrier so e.g. a collection of
results from the last phase of the run can be done.
\item To implement the barrier without the API above, the fact that all threads
have reached the barrier may be indicated by a counter value to be 0, for
example.  Each thread that reaches the barrier decrements the counter which is
initialized to the number of threads in the beginning.  Once a thread decrements
the counter and realizes it is not 0, it waits on a condition variable.  If the
thread is the one which discovers the counter to be 0, instead of waiting it
sends a broadcast which wakes up all the threads sleeping on the barrier.
\texttt{pthread\_cond\_signal} is not enough, since it is necessary to wake up
all the threads, not just one.  Before entering next phase the counter is
initialized to previous value.  This needs to be done carefully, for example it
is not possible just to reinitialize the counter after the last thread reaches
the barrier, because like was shown on page \pageref{CONDITION_VARIABLES} after
waking up from \texttt{pthread\_cond\_wait} the threads need to test that the
counter is indeed 0 and if not they need to be put to sleep again. So it can
happen that only some (or none) of the threads would wake up.  How would you
solve this problem? See \example{pthreads/implement-barrier.c} and
\example{pthreads/implement-barrier-fixed.c} for solution.
\end{itemize}

%%%%%

ifdef([[[NOSPELLCHECK]]], [[[
\pdfbookmark[1]{sem\_init, sem\_post, sem\_wait}{semaphores}
]]])

\begin{slide}
\sltitle{POSIX Semaphores -- unnamed}
\begin{itemize}
\item semaphores come from POSIX-1003.1b (real-time extensions)
\item the function names do not begin with \emsl{\texttt{pthread\_}},
but \emsl{\texttt{sem\_}}
\item possible to use them with threads
\end{itemize}
ifdef([[[NOSPELLCHECK]]], [[[
\texttt{int \funnm{sem\_init}(sem\_t *\emph{s},
int \emph{pshared}, unsigned int \emph{value});}
]]])
\begin{itemize}
\item initialize semaphore
\item The \funnm{sem\_post} and \funnm{sem\_wait} are the same as for named
semaphores (see page \pageref{NAMED_SEMAPHORES})
\end{itemize}
ifdef([[[NOSPELLCHECK]]], [[[
\texttt{int \funnm{sem\_destroy}(sem\_t *\emph{s});}
]]])
\begin{itemize}
\item destroys the unnamed semaphore
\end{itemize}
\end{slide}

\begin{itemize}
\item Semaphores created using this API do not have a name (as opposed to
those created via the \texttt{sem\_open} function), hence they are called
unnamed.
\item The semaphore functions adhere to the classical UNIX semantics --
they return -1 on error and set \texttt{errno}.
\item Example: \example{semaphores/sem.c}
\end{itemize}

%%%%%

\begin{slide}
\sltitle{Typical use of threads}

\begin{itemize}
\item \emsl{pipeline}
\begin{itemize}
\item each of the threads performs its own operation on data,
which is being passed between threads.
\item each thread performs different operation
\item[\dots] image processing, each thread will apply different filter
\end{itemize}

\item \emsl{work crew}
\begin{itemize}
\item the threads perform the same operation on different data
\item[\dots] image processing using decomposition -- each thread
processes different part of the image, the result is combination of
the results from all the threads; barrier is useful here.
\end{itemize}

\item \emsl{client -- server}
\end{itemize}
\end{slide}

\begin{itemize}
\item This differentiation is coarse, threads can be used in many different
ways, these are the most common cases.
\item In the case of client -- server model, each server thread processes
one request from single client.
\end{itemize}

%%%%%

\pdfbookmark[1]{thread-safe and reentrant functions}{thrsafe}

\begin{slide}
\sltitle{Thread-safe versus reentrant}

\begin{itemize}
\item \emph{thread-safe} means that the function can be called from multiple
threads in parallel without destructive consequences
\begin{itemize}
\item a function which was not designed to be thread-safe can be
converted into one that is -- by inserting a locked section.
\item this is obviously not very efficient
\end{itemize}
\item \emph{reentrant} typically means that the function was designed
with threads in mind
\begin{itemize}
\item \dots{}i.e. it works efficiently in multithreaded environment
\item such function should avoid using static data and if possible also
avoid using thread synchronization primitives because they slow down 
the run.
\end{itemize}
\end{itemize}
\end{slide}

\hlabel{THREADSAFE}

\begin{itemize}
\item The consequence of the above is that thread-safe is weaker property
than reentrant. It is possible to write thread-safe function using
synchronization primitives; rewrite existing function so that it is reentrant
requires much more invention.
\item The reentrant functions are also the only usable functions in
signal handlers.
\item These days thread-safe usually means reentrant, however it does not
hurt to know the difference.
\item For more on library locking see also page \pageref{MUTEXES2}.
\item There exists a number of functions that can be thread-safe, however
not reentrant, e.g. \texttt{gethostbyname}. Normally this function uses
static variable that is used for each call which makes it unsuitable to use
in multithreaded environment - it is not thread-safe.
However, on FreeBSD 6.0 it is implemented so that it implicitly uses
thread-local storage for saving input data and this makes it
thread safe. That said, it does not make it totally safe to use.
(not mentioning that a program relying on this behavior is not portable)
see example \example{reentrant/gethostbyname.c}.
A bit better is to use reentrant version of this function called 
\texttt{gethostbyname\_r} (if it is available on given system),
which takes the address of where the result will be stored as parameter,
which makes it reentrant. Much better is to use standard function
\texttt{getaddrinfo} (see page) \pageref{GETADDRINFO}, which is reentrant
by itself.

\item Example: \example{reentrant/inet\_ntoa.c} -- shows that even
function written like so does not help if it is called twice within
the same call of \texttt{printf}. Each time it returns pointer with
the same address (in one thread) which the \texttt{printf} only notes
and only uses the contents for the final printing, therefore prints
the representation of the last address used in \texttt{inet\_ntoa}.
On Solaris it is possible to observe this with:
\begin{verbatim}
truss -t\!all -u libnsl::inet_ntoa ./a.out
\end{verbatim}
\item Man pages on Solaris contain an item called
\texttt{MT-level} in the \texttt{ATTRIBUTES} section that states
whether the function can be used in multithreaded environment and
any constraints. The levels are described in the attributes(5) man page.
\end{itemize}

%%%%%

ifdef([[[NOSPELLCHECK]]], [[[
\pdfbookmark[1]{pthread\_set\_name\_np, pthread\_main\_np}{threadnpapis}
]]])

\begin{slide}
\sltitle{Non-portable thread APIs}

\begin{itemize}
\item non-portable APIs have the \texttt{\_np} suffix (\emph{non-portable}) and
individual systems define their own calls.

\item FreeBSD, Solaris
\begin{itemize}
\item \funnm{pthread\_set\_name\_np}\texttt{(pthread\_t tid, const char *name)}
\item[$\rightarrow$] can give a name to a thread
\end{itemize}

\item Solaris
\begin{itemize}
\item \funnm{pthread\_cond\_reltimedwait\_np}\texttt{(\dots)}
\item[$\rightarrow$] like \texttt{timedwait}, however the timeout is relative
\end{itemize}

\item OpenBSD, macOS
\begin{itemize}
\item \texttt{int} \funnm{pthread\_main\_np}\texttt{(void)}
\item[$\rightarrow$] find out if calling thread is the main one
(= \texttt{main()})
\end{itemize}

\end{itemize}
\end{slide}

\begin{itemize}
\item for naming threads, Linux distributions might have
\texttt{pthread\_setname\_np} (i.e. without the underscore, unlike the
Solaris/FreeBSD), further cementing the non-portability character.
Anyhow, named threads might be visible in debuggers or in the output
of utilities such as \texttt{ps} (for the latter, that would be in the
\texttt{COMMAND} column, which is usually reserved for program path and its
arguments).
\item Non-portable calls should be used either for system programs or
debugging. You never know when the code will be executed on different system,
which typically happens once you leave the company and having the time to
fix it anymore.
\item To find out which non-portable APIs are available on the system,
try running \texttt{apropos \_np}, or using brute-force (use the location
of man pages on given system):
\begin{verbatim}
$ find /usr/share/man/ -type f -name '*_np\.*'
./man3c/mq_reltimedreceive_np.3c
./man3c/mq_reltimedsend_np.3c
./man3c/posix_spawnattr_getsigignore_np.3c
./man3c/posix_spawnattr_setsigignore_np.3c
./man3c/pthread_cond_reltimedwait_np.3c
./man3c/pthread_key_create_once_np.3c
./man3c/pthread_mutex_reltimedlock_np.3c
./man3c/pthread_rwlock_reltimedrdlock_np.3c
./man3c/pthread_rwlock_reltimedwrlock_np.3c
./man3c/sem_reltimedwait_np.3c
\end{verbatim}
\end{itemize}

\pagebreak

\endinput