Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reapply revert of https://github.com/dotnet/runtime/pull/97227, fix Lock's waiter duration computation #98876

Merged
merged 5 commits into from
Mar 7, 2024

Conversation

kouvel
Copy link
Member

@kouvel kouvel commented Feb 23, 2024

PR #97227 introduced a tick count masking issue where the stored waiter start time excludes the upper bit from the ushort tick count, but comparisons with it were not doing the appropriate masking. This was leading to a lock convoy on some heavily contended locks once in a while due to waiters incorrectly appearing to have waited for a long time.

@kouvel kouvel added this to the 9.0.0 milestone Feb 23, 2024
@kouvel kouvel requested review from jkotas and VSadov February 23, 2024 22:37
@kouvel kouvel self-assigned this Feb 23, 2024
@ghost
Copy link

ghost commented Feb 23, 2024

Tagging subscribers to this area: @mangod9
See info in area-owners.md if you want to be subscribed.

Issue Details

PR #97227 introduced a tick count masking issue where the stored waiter start time excludes the upper bit from the ushort tick count, but comparisons with it were not doing the appropriate masking. This was leading to a lock convoy on some heavily contended locks once in a while due to waiters incorrectly appearing to have waited for a long time.

Author: kouvel
Assignees: kouvel
Labels:

area-System.Threading

Milestone: 9.0.0

return
waiterStartTimeMs != 0 &&
(ushort)Environment.TickCount - waiterStartTimeMs >= MaxDurationMsForPreemptingWaiters;
(Environment.TickCount & 0x7fff) - waiterStartTimeMs >= MaxDurationMsForPreemptingWaiters;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
(Environment.TickCount & 0x7fff) - waiterStartTimeMs >= MaxDurationMsForPreemptingWaiters;
(ushort)(Environment.TickCount - waiterStartTimeMs) >= MaxDurationMsForPreemptingWaiters;

to deal better with wrap around?

The way you have it written, we can get a false positive every 32s since (Environment.TickCount & 0x7fff) can be more than MaxDurationMsForPreemptingWaiters, waiterStartTimeMs can be near short.MaxValue, and (Environment.TickCount & 0x7fff) - waiterStartTimeMs will be large negative value that will make the condition false even through the elapsed time is more than MaxDurationMsForPreemptingWaiters.

Copy link
Member Author

@kouvel kouvel Feb 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, fixed. I think the & 0x7fff is needed on the computed duration, since the recorded start time excludes the upper bit of a ushort. For instance, if the actual start time was 0xffff, 0x7fff would be recorded, and if the current time is 0x10000, the diff would be 0x8001 ms instead of 1 ms.

return
waiterStartTimeMs != 0 &&
(ushort)Environment.TickCount - waiterStartTimeMs >= MaxDurationMsForPreemptingWaiters;
(ushort)((Environment.TickCount - waiterStartTimeMs) & 0x7fff) >= MaxDurationMsForPreemptingWaiters;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
(ushort)((Environment.TickCount - waiterStartTimeMs) & 0x7fff) >= MaxDurationMsForPreemptingWaiters;
((Environment.TickCount - waiterStartTimeMs) & 0x7fff) >= MaxDurationMsForPreemptingWaiters;

The cast to ushort is unnecessary.

I think that the ushort WaiterStartTimeMs property is misleading given that only 15 bits are actually valid. The type of the property can be int with a comment that only lower 15 bits are valid. Nearly all callers need to be aware of it.

There is another subtle bug a few lines above:

            if (currentTimeMs == 0)
            {
                // Don't record zero, that value is reserved for indicating that a time is not recorded
                currentTimeMs--;
            }

This won't work correctly when currentTimeMs is 0x8000. It is going to be recorded as 0 that is not correct.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again, fixed

Copy link
Member

@VSadov VSadov Feb 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if using the lowermost bit as the flag (as in #97227 (comment)) would make this simpler?

At least ushort WaiterStartTimeMs would not be misleading as the whole ushort range would be in use.
Also wrap around would happen in 60 sec. not in 30.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

        private ushort WaiterStartTimeMs
        {
            get => (ushort)(_waiterStartTimeMsAndFlags &  ~1);
            set => _waiterStartTimeMsAndFlags = (ushort)(value | (_waiterStartTimeMsAndFlags & 1));
        }

Then the same comparison as before might work:

(ushort)Environment.TickCount - WaiterStartTimeMs >= MaxDurationMsForPreemptingWaiters;

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would work, but would need to make sure that 0 is not recorded when the actual time is 1, so there's still a little bit of complication at the caller side. I was thinking I could just use a bit in the _state field to simplify. The waiter count has plenty of bits and one could be used for this instead. The waiter start time is stored and used in only a couple of places so it's probably ok as is, but I'll go ahead and make that change to use a bit in the _state field, it would be a bit cleaner.

Copy link
Member Author

@kouvel kouvel Feb 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also the previous code when the ushort field was being used directly:

(ushort)Environment.TickCount - waiterStartTimeMs >= MaxDurationMsForPreemptingWaiters

Doesn't work because of the implicit int promotion in the subtract. Eg. if the recorded start time was (ushort)-100 and the current time as a ushort is 0, the result of the subtract would be a large negative int value instead of 100, and would not trigger the heuristic. Need additional casting here to force ushort math.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe using short instead would simplify, as the int promotion would sign-extend instead, will consider

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That wouldn't work either, this expression would need extra casting it seems

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it would need to be 16bit math.

Copy link
Member

@jkotas jkotas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks!

@kouvel
Copy link
Member Author

kouvel commented Feb 27, 2024

Updated to use a bit in the _state field instead (see #98876 (comment)), please take a look

@VSadov
Copy link
Member

VSadov commented Feb 28, 2024

Using a bit from the _state seems cleaner.
The remaining bits are enough to track tens millions of waiters/threads, so that should be enough for a while.

Copy link
Member

@VSadov VSadov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@VSadov
Copy link
Member

VSadov commented Feb 28, 2024

If the max number of possible waiters becomes uncomfortable - i.e. if we think millions of threads will be reachable in some extreme cases, I think more bits could be taken from the spinner count.
(Actually, I think we would be ok if we were not tracking the number of spinners at all - that is 7 more free bits, if needed)

@kouvel
Copy link
Member Author

kouvel commented Mar 1, 2024

Rebased

kouvel added 5 commits March 1, 2024 10:08
PR dotnet#97227 introduced a tick count masking issue where the stored waiter start time excludes the upper bit from the ushort tick count, but comparisons with it were not doing the appropriate masking. This was leading to a lock convoy on some heavily contended locks once in a while due to waiters incorrectly appearing to have waited for a long time.

Fixes dotnet#98021
@kouvel kouvel merged commit 77141ae into dotnet:main Mar 7, 2024
179 checks passed
@kouvel kouvel deleted the LockFix branch March 7, 2024 06:06
@github-actions github-actions bot locked and limited conversation to collaborators Apr 7, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
3 participants