Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There were several places where the code in
libfaketime.c
used ad-hoc synchronisation (eg for doing the initialisation only once) using non-atomic C variables. In a multithreaded program, this is not correct. The C language rules are that data races like this are completely forbidden, and the compiler is allowed to optimise the code using the assumption that such races do not ever occur.In practice, while trying to fix faketime in Debian trixie, I found that faketime misbehaved in CI, in the test suites of various other packages. Notably, the Ruby interpreter, but other cases too. Some cases involved stack traces, and others "impossible" results. In each case, I examined the output from the test case, and the apparently-relevant code in libfaketime.c. In each case, I found code in libfaketime.c that seemed to be to be incorrect. Fixing the data races, by using the correct constructions and the correct primitives, fixed all the tests.
It's likely that the approach I've taken won't compile on some platforms targeted by libfaketime. I have relied on the existence of pthread_once. For platforms without pthread_once, there will hopefully be an analogous function. If there are platforms without an anlogous function, there is an obvious implementation in terms of a mutex, at the cost of some loss of performance. I would strongly counsel against attempts to open-code using spin locks and atomics, unless the code can be both written, and separately reviewed, by at least two experts in these techniques. In any case, implementations using ordinary non-atomic variables cannot be correct and will lead to race bugs. Sorry that I haven't provided support for those other platforms.
I think the final patch "Don't use try locking calls for monotonic_conds_lock" will fix #464.
Together with #487 and #486, these race fixes fix the CI failures with packages' test suites, in Debian.