Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for possible false-positive GC delays #8483

Merged
merged 5 commits into from
Jun 14, 2023

Conversation

ntovas
Copy link
Contributor

@ntovas ntovas commented Jun 12, 2023

In the application of our organization, we noticed sporadically some .NET Runtime Platform stalled for... warnings, even with very small GC object counts, so after a code review we suspect that the additional checks in WatchdogHeartbeatTick may be responsible for a false-positive GC delay.

Microsoft Reviewers: Open in CodeFlow


lastWatchdogCheck = now;

this.participantThread = new Thread(this.RunParticipantCheck)
{
IsBackground = true,
Name = "Orleans.Runtime.Watchdog",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two threads have the same name but different behavior.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed that, thanks.

@ReubenBond
Copy link
Member

ReubenBond commented Jun 12, 2023

This seems ok to me, but I would rather it be refactored to still use a single thread, just accounting for the time spend checking other participants. Eg, by extending the try-finally block surrounding CheckYourOwnHealth to cover the entire WatchdogHeartbeatTick() method or by moving the block outside WatchdogHeartbeatTick and into a finally block in Run() instead.

EDIT: by the way, I welcome the gist of the change: reducing false-positive warnings.

@ReubenBond ReubenBond self-assigned this Jun 12, 2023
@ntovas
Copy link
Contributor Author

ntovas commented Jun 13, 2023

I believe that by using a single thread and delay the next check, we may cause false negatives, and since we already faced a lot of challenges by GC pressure and thread starvation, we would like this check to be as reliable as possible.

this.gcThread = new Thread(this.RunGCCheck)
{
IsBackground = true,
Name = "Orleans.Runtime.Watchdog.GCMonitor",
Copy link
Member

@ReubenBond ReubenBond Jun 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GCMonitor is a misnomer, since it is monitoring for execution stalls in general. Maybe call it StallMonitor or RuntimeMonitor

@ReubenBond ReubenBond merged commit e961202 into dotnet:main Jun 14, 2023
@github-actions github-actions bot locked and limited conversation to collaborators Dec 2, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants