Fix several data races #488

ijackson · 2025-01-27T13:11:23Z

There were several places where the code in libfaketime.c used ad-hoc synchronisation (eg for doing the initialisation only once) using non-atomic C variables. In a multithreaded program, this is not correct. The C language rules are that data races like this are completely forbidden, and the compiler is allowed to optimise the code using the assumption that such races do not ever occur.

In practice, while trying to fix faketime in Debian trixie, I found that faketime misbehaved in CI, in the test suites of various other packages. Notably, the Ruby interpreter, but other cases too. Some cases involved stack traces, and others "impossible" results. In each case, I examined the output from the test case, and the apparently-relevant code in libfaketime.c. In each case, I found code in libfaketime.c that seemed to be to be incorrect. Fixing the data races, by using the correct constructions and the correct primitives, fixed all the tests.

It's likely that the approach I've taken won't compile on some platforms targeted by libfaketime. I have relied on the existence of pthread_once. For platforms without pthread_once, there will hopefully be an analogous function. If there are platforms without an anlogous function, there is an obvious implementation in terms of a mutex, at the cost of some loss of performance. I would strongly counsel against attempts to open-code using spin locks and atomics, unless the code can be both written, and separately reviewed, by at least two experts in these techniques. In any case, implementations using ordinary non-atomic variables cannot be correct and will lead to race bugs. Sorry that I haven't provided support for those other platforms.

I think the final patch "Don't use try locking calls for monotonic_conds_lock" will fix #464.

Together with #487 and #486, these race fixes fix the CI failures with packages' test suites, in Debian.

At the cost of no longer nicely detecting recursive initialisation problems. Fixes Debian bug #1093599

Otherwise we can use this in an uninitialised state, which is not allowed. We call ftpl_init in pthread_cond_init_232, but the application might not have called that. For example, it might have a static condition variable set up with PTHREAD_COND_INITIALIZER.

This reverts commit 8ef74e3 "Swapped out pthread_rwlock_xxlock() ..." This could result in concurrent uses of pthread_cond_* erroneously returning EAGAIN, which is not permitted by the spec and which the application way well treat as a bug. This seems to be happening in gem2deb in ci.debian.net. The commit message in 8ef74e3 says (rewrapped) Swapped out pthread_rwlock_xxlock(), which doesn't return if it can't obtain the lock, with pthread_rwlock_xxtrylock() followed by sched yield and error code return. The issue is sometimes a thread calling pthread_cond_init() or pthread_cond_destroy() can't acquire the lock when another thread is waiting on a condition variable notification via pthread_cond_timedwait(), and thus the thread calling pthread_cond_init() or pthread_cond_destroy() end up hanging indefinitely. I don't think this is true. The things that are done with monotonic_conds_lock held are HASH_ADD_PTR HASH_FIND_PTR etc. on monotonic_conds, which should all be fast and AFAICT don't in turn take any locks. So it shouldn't deadlock. I conjecture that the underlying bug being experienced by the author of "Swapped out pthread_rwlock_xxlock" was the lack of ftpl_init - ie, access to an uninitialised pthread_rwlock_t. That might result in a hang.

ijackson · 2025-01-27T13:13:16Z

(force pushed to fix a Debianism in one of the bug references in a commit message.)

ijackson added 4 commits January 27, 2025 13:12

Replace data race with use of pthread_once (ftpl_init)

2503b0f

At the cost of no longer nicely detecting recursive initialisation problems. Fixes Debian bug #1093599

Replace data race with use of pthread_once (ft_shm_init)

d9ba684

ijackson force-pushed the races branch from 74bab27 to 50e2c56 Compare January 27, 2025 13:12

wolfcw self-assigned this Jan 27, 2025

ijackson mentioned this pull request Jan 28, 2025

Unexpected return values from pthread_cond_init_232, pthread_cond_destroy_232, pthread_cond_timedwait #464

Closed

wolfcw merged commit b5a48c8 into wolfcw:master Jan 28, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix several data races #488

Fix several data races #488

ijackson commented Jan 27, 2025

ijackson commented Jan 27, 2025

Fix several data races #488

Fix several data races #488

Conversation

ijackson commented Jan 27, 2025

ijackson commented Jan 27, 2025