-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reapply revert of https://github.com/dotnet/runtime/pull/97227, fix Lock's waiter duration computation #98876
Conversation
Tagging subscribers to this area: @mangod9 Issue DetailsPR #97227 introduced a tick count masking issue where the stored waiter start time excludes the upper bit from the ushort tick count, but comparisons with it were not doing the appropriate masking. This was leading to a lock convoy on some heavily contended locks once in a while due to waiters incorrectly appearing to have waited for a long time.
|
return | ||
waiterStartTimeMs != 0 && | ||
(ushort)Environment.TickCount - waiterStartTimeMs >= MaxDurationMsForPreemptingWaiters; | ||
(Environment.TickCount & 0x7fff) - waiterStartTimeMs >= MaxDurationMsForPreemptingWaiters; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Environment.TickCount & 0x7fff) - waiterStartTimeMs >= MaxDurationMsForPreemptingWaiters; | |
(ushort)(Environment.TickCount - waiterStartTimeMs) >= MaxDurationMsForPreemptingWaiters; |
to deal better with wrap around?
The way you have it written, we can get a false positive every 32s since (Environment.TickCount & 0x7fff)
can be more than MaxDurationMsForPreemptingWaiters
, waiterStartTimeMs
can be near short.MaxValue
, and (Environment.TickCount & 0x7fff) - waiterStartTimeMs
will be large negative value that will make the condition false even through the elapsed time is more than MaxDurationMsForPreemptingWaiters
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, fixed. I think the & 0x7fff
is needed on the computed duration, since the recorded start time excludes the upper bit of a ushort. For instance, if the actual start time was 0xffff
, 0x7fff
would be recorded, and if the current time is 0x10000
, the diff would be 0x8001 ms instead of 1 ms.
return | ||
waiterStartTimeMs != 0 && | ||
(ushort)Environment.TickCount - waiterStartTimeMs >= MaxDurationMsForPreemptingWaiters; | ||
(ushort)((Environment.TickCount - waiterStartTimeMs) & 0x7fff) >= MaxDurationMsForPreemptingWaiters; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(ushort)((Environment.TickCount - waiterStartTimeMs) & 0x7fff) >= MaxDurationMsForPreemptingWaiters; | |
((Environment.TickCount - waiterStartTimeMs) & 0x7fff) >= MaxDurationMsForPreemptingWaiters; |
The cast to ushort is unnecessary.
I think that the ushort WaiterStartTimeMs
property is misleading given that only 15 bits are actually valid. The type of the property can be int
with a comment that only lower 15 bits are valid. Nearly all callers need to be aware of it.
There is another subtle bug a few lines above:
if (currentTimeMs == 0)
{
// Don't record zero, that value is reserved for indicating that a time is not recorded
currentTimeMs--;
}
This won't work correctly when currentTimeMs is 0x8000
. It is going to be recorded as 0 that is not correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks again, fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if using the lowermost bit as the flag (as in #97227 (comment)) would make this simpler?
At least ushort WaiterStartTimeMs
would not be misleading as the whole ushort range would be in use.
Also wrap around would happen in 60 sec. not in 30.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
private ushort WaiterStartTimeMs
{
get => (ushort)(_waiterStartTimeMsAndFlags & ~1);
set => _waiterStartTimeMsAndFlags = (ushort)(value | (_waiterStartTimeMsAndFlags & 1));
}
Then the same comparison as before might work:
(ushort)Environment.TickCount - WaiterStartTimeMs >= MaxDurationMsForPreemptingWaiters;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would work, but would need to make sure that 0 is not recorded when the actual time is 1, so there's still a little bit of complication at the caller side. I was thinking I could just use a bit in the _state
field to simplify. The waiter count has plenty of bits and one could be used for this instead. The waiter start time is stored and used in only a couple of places so it's probably ok as is, but I'll go ahead and make that change to use a bit in the _state
field, it would be a bit cleaner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also the previous code when the ushort field was being used directly:
(ushort)Environment.TickCount - waiterStartTimeMs >= MaxDurationMsForPreemptingWaiters
Doesn't work because of the implicit int promotion in the subtract. Eg. if the recorded start time was (ushort)-100
and the current time as a ushort is 0, the result of the subtract would be a large negative int value instead of 100, and would not trigger the heuristic. Need additional casting here to force ushort math.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe using short
instead would simplify, as the int promotion would sign-extend instead, will consider
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That wouldn't work either, this expression would need extra casting it seems
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it would need to be 16bit math.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks!
Updated to use a bit in the |
Using a bit from the _state seems cleaner. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
If the max number of possible waiters becomes uncomfortable - i.e. if we think millions of threads will be reachable in some extreme cases, I think more bits could be taken from the spinner count. |
Rebased |
…waits (dotnet#97227)" (dotnet#98867) This reverts commit f129701.
PR dotnet#97227 introduced a tick count masking issue where the stored waiter start time excludes the upper bit from the ushort tick count, but comparisons with it were not doing the appropriate masking. This was leading to a lock convoy on some heavily contended locks once in a while due to waiters incorrectly appearing to have waited for a long time. Fixes dotnet#98021
PR #97227 introduced a tick count masking issue where the stored waiter start time excludes the upper bit from the ushort tick count, but comparisons with it were not doing the appropriate masking. This was leading to a lock convoy on some heavily contended locks once in a while due to waiters incorrectly appearing to have waited for a long time.
Lock
to have it use non-alertable waits #97227 after it was reverted by Revert "Add an internal mode toLock
to have it use non-alertable w… #98867. Fixes Update some of NativeAOT's uses of Lock to use non-alertable waits #97105.