Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

<thread>: this_thread::sleep_until uses system time even if based on steady_clock #718

Closed
AlexGuteniev opened this issue Apr 15, 2020 · 24 comments · Fixed by #4457
Closed
Labels
bug Something isn't working fixed Something works now, yay!

Comments

@AlexGuteniev
Copy link
Contributor

AlexGuteniev commented Apr 15, 2020

Describe the bug
Internally, sleep_until calls _To_xtime_10_day_clamped to convert time point of any clock to time point of System clock. Then _Thrd_sleep sleeps on this system time point:

STL/stl/inc/thread

Lines 154 to 166 in 3141965

template <class _Clock, class _Duration>
void sleep_until(const chrono::time_point<_Clock, _Duration>& _Abs_time) {
for (;;) {
const auto _Now = _Clock::now();
if (_Abs_time <= _Now) {
return;
}
_CSTD xtime _Tgt;
(void) _To_xtime_10_day_clamped(_Tgt, _Abs_time - _Now);
_Thrd_sleep(&_Tgt);
}
}

_To_xtime_10_day_clamped uses system_clock

STL/stl/inc/chrono

Lines 629 to 652 in 3141965

// HELPERS
template <class _Rep, class _Period>
_NODISCARD bool _To_xtime_10_day_clamped(_CSTD xtime& _Xt, const chrono::duration<_Rep, _Period>& _Rel_time) noexcept(
is_arithmetic_v<_Rep>) {
// Convert duration to xtime, maximum 10 days from now, returns whether clamping occurred.
// If clamped, timeouts will be transformed into spurious non-timeout wakes, due to ABI restrictions where
// the other side of the DLL boundary overflows int32_t milliseconds.
// Every function calling this one is TRANSITION, ABI
constexpr chrono::nanoseconds _Ten_days{chrono::hours{24} * 10};
constexpr chrono::duration<double> _Ten_days_d{_Ten_days};
chrono::nanoseconds _T0 = chrono::system_clock::now().time_since_epoch();
const bool _Clamped = _Ten_days_d < _Rel_time;
if (_Clamped) {
_T0 += _Ten_days;
} else {
_T0 += chrono::duration_cast<chrono::nanoseconds>(_Rel_time);
}
const auto _Whole_seconds = chrono::duration_cast<chrono::seconds>(_T0);
_Xt.sec = _Whole_seconds.count();
_T0 -= _Whole_seconds;
_Xt.nsec = static_cast<long>(_T0.count());
return _Clamped;
}

If system_clock is adjusted between _To_xtime_10_day_clamped and _Thrd_sleep, the sleep interval is wrong.

sleep_for is also subject to this problem as it is based on sleep_until

STL/stl/inc/thread

Lines 168 to 171 in 3141965

template <class _Rep, class _Period>
void sleep_for(const chrono::duration<_Rep, _Period>& _Rel_time) {
sleep_until(chrono::steady_clock::now() + _Rel_time);
}

Though standard says

The functions whose names end in _­for take an argument that specifies a duration. These functions produce relative timeouts. Implementations should use a steady clock to measure time for these functions.

Command-line test case
Hard to build stable repro, but will try if needed.

Expected behavior
Neither sleep_until relative to a steady clock nor sleep_for should depend on system_clock.

GetTickCount64 is good enough steady clock, as Sleep API does not have granularity less than a millisecond anyway (calling undocumented NtDelayExecution is unacceptable). GetTickCount64 is very fast clock, it is basically reading a global variable.

STL version

Microsoft Visual Studio Professional 2019 Preview
Version 16.6.0 Preview 2.1

Additional context
Suspect that tests for #593 are failing due to timeout because other tests simultaneously tamper with system clock. Ok, this theory is defeated now, but the bug still looks valid

@StephanTLavavej StephanTLavavej added bug Something isn't working vNext Breaks binary compatibility labels Apr 15, 2020
@StephanTLavavej
Copy link
Member

Yep, this is a known bug that affects bincompat, thanks for reporting it.

@StephanTLavavej StephanTLavavej removed the vNext Breaks binary compatibility label Apr 15, 2020
@StephanTLavavej
Copy link
Member

After 5 minutes of additional thought, we've realized that this might be fixable by adding new functions to the import library (without changing the behavior of the DLL, which must remain unchanged for previously compiled code). We hadn't considered this solution before, for the thread sleep issue, despite using the import lib for other features/fixes. 🧠

@AlexGuteniev
Copy link
Contributor Author

My observations here:

  1. There are currently three occurrences of this pattern:
  • this_thread::sleep_{for|until}
  • condition_variable::sleep_{for|until}
  • condition_variable_any::sleep_{for|until}
    Others, like timed_mutex are based on one of CVs.
    For consistency, all need to be fixed by a single change.
  1. As I mentioned, imprecise GetTickCount64 will do. Currently, _Xtime_get_ticks calls __crtGetSystemTimePreciseAsFileTime, which has unnecessary overhead.

  2. While converting chrono time to imprecise time, must round up, not round down. Sleep should sleep "at least for".

  3. There is unnecessary conversion back and forth. this_thread::sleep_for goes to this_thread_sleep_until, then sleep_until forms system time point and ultimately uses for Windows API (as there are no public "until" APIs). I think extra conversion is fine as long as it does not impact performance and does not cause interval to be shorter (that is 2 and 3 are addressed).

  4. I've exposed GetTickCount64 as a part of Implementation of std::atomic::wait #593, Sleep is also internally used there and can be exposed. Yet, I think import library solution is better to fix this bug, instead of waiting satellite. (Moreover, if fix is committed before waiting, Implementation of std::atomic::wait #593 can be switched to use new import library functions.

  5. There is reverse version of this bug. System clock based sleep_until may not be sensitive to system clock adjustment. See [thread.req.timing]/4. But I think that whereas this bug could be fixed, it is impractical. The fix would mean that all waits on non-steady clock are proxied via another variable that is waked on system clock forward adjustment. Should this still be filled as a bug?

@AlexGuteniev
Copy link
Contributor Author

6. There is reverse version of this bug. System clock based sleep_until may not be sensitive to system clock adjustment. See [thread.req.timing]/4. But I think that whereas this bug could be fixed, it is impractical. The fix would mean that all waits on non-steady clock are proxied via another variable that is waked on system clock forward adjustment. Should this still be filled as a bug?

Is sounds more like a Standard defect to me. Standard defines this hard to implement feature, but acknowledges it is not useful:

[ Note: If the clock is not synchronized with a steady clock, e.g., a CPU time clock, these timeouts might not provide useful functionality. — end note ]

@AlexGuteniev
Copy link
Contributor Author

So here are three functions that needs to be replaced with other functions:

_CRTIMP2_PURE void __cdecl _Thrd_sleep(const xtime*);
_CRTIMP2_PURE int __cdecl _Mtx_timedlock(_Mtx_t, const xtime*);
_CRTIMP2_PURE int __cdecl _Cnd_timedwait(_Cnd_t, _Mtx_t, const xtime*);

Apparently they can be re-implemented in a new translation unit injected into the import library.
The most complex is _Mtx_timedlock, but still copying existing function should do the trick.

New functions could be made taking either time point relative to GetTickCount64 or duration. Duration seems even better to me, as there are no unnecessary conversions, when program passes duration.

Duration can be unsigned long milliseconds (that directly match current Windows API) or long long nanoseconds (that provides potential for precision improvement for future, if more precise Windows APIs are ever exposed)

@BillyONeal
Copy link
Member

AFAIK they can't be implemented in the import library because we might choose ConcRT backing for the synchronization primitives, and ConcRT depends on global state in the STL and ConcRT DLLs that will be damaged trying to do this from outside.

Fixing this_thread:: things might be doable in an import library change because they don't depend on anything else. What I did in our vMajorNext branch was call NtWaitForSingleObject with a handle of GetCurrentThread(). We know that the current thread is running so the handle will never be signaled, so NtWaitForSingleObject must time out. NtWaitForSingleObject was chosen over WaitForSingleObject because the NtXxx version accepts both absolute and relative timeouts (according to the sign of the supplied LARGE_INTEGER).

I recommend just waiting until we can do all the vNext things though. Sleeping a thread isn't a super useful tool anyway.

@AlexGuteniev
Copy link
Contributor Author

I think the use of NtWait-whatever for sleep is great idea, but isn't it cheating in sight of #705 ?

@BillyONeal
Copy link
Member

I believe NtWaitForSingleObject is documented: https://docs.microsoft.com/en-us/windows-hardware/drivers/ddi/ntifs/nf-ntifs-zwwaitforsingleobject

@AlexGuteniev
Copy link
Contributor Author

It is driver kit, not SDK. In theory it might mean higher risk for this to be broken in future OS. In practice I don't think so, not at least for this function.

@AlexGuteniev
Copy link
Contributor Author

AFAIK they can't be implemented in the import library because we might choose ConcRT backing for the synchronization primitives, and ConcRT depends on global state in the STL and ConcRT DLLs that will be damaged trying to do this from outside.

I actually don't get this problem. Say there's exactly the same function as mtx_do_lock, which takes duration or time point relative to a steady click, coexisting with old function. What's the problem?

@BillyONeal
Copy link
Member

The implementation of the new function needs to do waits / wake waiters / etc. in a way that existing .objs that don't call the new function aren't broken. I don't know exactly how the ConcRT backend works but I'm pretty sure it has some globals that will be inaccessible from the import library.

@AlexGuteniev
Copy link
Contributor Author

AlexGuteniev commented Apr 24, 2020

I think you are referring to this:

STL/stl/src/primitives.h

Lines 22 to 45 in 0cbd1b2

enum class __stl_sync_api_modes_enum { normal, win7, vista, concrt };
extern __stl_sync_api_modes_enum __stl_sync_api_impl_mode;
namespace Concurrency {
namespace details {
class __declspec(novtable) stl_critical_section_interface {
public:
virtual void lock() = 0;
virtual bool try_lock() = 0;
virtual bool try_lock_for(unsigned int) = 0;
virtual void unlock() = 0;
virtual void destroy() = 0;
};
class __declspec(novtable) stl_condition_variable_interface {
public:
virtual void wait(stl_critical_section_interface*) = 0;
virtual bool wait_for(stl_critical_section_interface*, unsigned int) = 0;
virtual void notify_one() = 0;
virtual void notify_all() = 0;
virtual void destroy() = 0;
};

STL/stl/src/primitives.h

Lines 288 to 316 in 0cbd1b2

inline void create_stl_critical_section(stl_critical_section_interface* p) {
#ifdef _CRT_WINDOWS
new (p) stl_critical_section_win7;
#else
switch (__stl_sync_api_impl_mode) {
case __stl_sync_api_modes_enum::normal:
case __stl_sync_api_modes_enum::win7:
if (are_win7_sync_apis_available()) {
new (p) stl_critical_section_win7;
return;
}
// fall through
case __stl_sync_api_modes_enum::vista:
if (are_vista_sync_apis_available()) {
new (p) stl_critical_section_vista;
return;
}
// fall through
case __stl_sync_api_modes_enum::concrt:
default:
#ifdef _STL_CONCRT_SUPPORT
new (p) stl_critical_section_concrt;
return;
#else
std::terminate();
#endif // _STL_CONCRT_SUPPORT
}
#endif // _CRT_WINDOWS
}

These APIs are already in relative units.
xtime* - system time functions are built on top of them.
So new functions can be built on top of them too, there's no problem.

@AlexGuteniev
Copy link
Contributor Author

In fact the decision is made when mutex or condition_variable[_any] is created.
The timed functions just perform virtual function calls.

@BillyONeal
Copy link
Member

If you can get it to work I'm not going to dispute an existence proof

@shmuelie
Copy link

Any update on this? We've run into this issue

@AlexGuteniev
Copy link
Contributor Author

Not from me. I'm not interested in contributing a fix or other changes in near time.

Although due to cleanup in mutexes area recently performed by @StephanTLavavej and me, there's no more case __stl_sync_api_modes_enum::concrt: case, so the fix I have in mind would be a bit easier to test out.

@AlexGuteniev
Copy link
Contributor Author

Actually I think @BillyONeal would not question my idea anymore, as his concern was ConCRT:

AFAIK they can't be implemented in the import library because we might choose ConcRT backing for the synchronization primitives, and ConcRT depends on global state in the STL and ConcRT DLLs that will be damaged trying to do this from outside.

and like I said, it was ConCRT case was recently wiped out.
Though Billy will not participate in near time either.

But it is fixable, possibly if you want the fix that much, you can contribute it....

@AnyOldName3
Copy link

AFAIK they can't be implemented in the import library because we might choose ConcRT backing for the synchronization primitives, and ConcRT depends on global state in the STL and ConcRT DLLs that will be damaged trying to do this from outside.

Fixing this_thread:: things might be doable in an import library change because they don't depend on anything else. What I did in our vMajorNext branch was call NtWaitForSingleObject with a handle of GetCurrentThread(). We know that the current thread is running so the handle will never be signaled, so NtWaitForSingleObject must time out. NtWaitForSingleObject was chosen over WaitForSingleObject because the NtXxx version accepts both absolute and relative timeouts (according to the sign of the supplied LARGE_INTEGER).

I recommend just waiting until we can do all the vNext things though. Sleeping a thread isn't a super useful tool anyway.

Is there any actual need for waiting on the current thread with a timeout given that waitable timers exist and work with relative and absolute times with 100ns precision? https://docs.microsoft.com/en-us/windows/win32/sync/using-waitable-timer-objects shows the basic idea. It's all documented Win32 stuff, so no need to use NtXxx and ZwXxx functions. This is the approach used in OpenThreads for microsecond-precise sleeps: https://github.com/openscenegraph/OpenSceneGraph/blob/master/src/OpenThreads/win32/Win32Thread.cpp#L670-L698

Using that would fix the resolution issues caused by using millisecond-precise Sleep (which is what originally made me look into this), and presumably NtWaitForSingleObject uses the same clock as regular WaitForSingleObject, so if the NT one fixes the issue described in this report's title, the Win32 one should, too.

@BillyONeal
Copy link
Member

Is there any actual need for waiting on the current thread with a timeout given that waitable timers exist and work with relative and absolute times with 100ns precision?

Creating waitable timers can fail, but this_thread::sleep_for can't.

This is the approach used in OpenThreads for microsecond-precise sleeps:

The kernel does not provide microsecond sleeps in any form, even though there are some APIs that accept FILETIME. The scheduler operates by default at 1/64th of a second resolution, boostable to 1ms resolution: https://randomascii.wordpress.com/2020/10/04/windows-timer-resolution-the-great-rule-change/

Using that would fix the resolution issues caused by using millisecond-precise Sleep

Just because the API accepts better than milliseconds doesn't mean that the kernel scheduler will ever give that to you.

@AnyOldName3
Copy link

Fair enough.

@haferburg
Copy link

Any update on this? It's unfortunate that there's not even a workaround using STL.

Workarounds are to use Boost (tested with 1.67) or the Win32 API directly.

@WenceyWang
Copy link

Any update on this?

@yinkaisheng
Copy link

When will it be fixed?

@AlexGuteniev
Copy link
Contributor Author

When will it be fixed?

It is already fixed for this_thread
For condition_variable and condition_variable_any it will be fixed when a PR with a fix will be merged.

AlexGuteniev added a commit to AlexGuteniev/STL that referenced this issue Mar 8, 2024
@StephanTLavavej StephanTLavavej added the fixed Something works now, yay! label Mar 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working fixed Something works now, yay!
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants